Spring Security ServerHttpBasicAuthenticationConverter uses platform's default charset

Expected Behavior

ServerHttpBasicAuthenticationConverter should use utf-8 (or a configurable charset) for String construction. My understanding is that the majority of HTTP clients, especially browsers, use utf-8 nowadays for basic auth encoding (citation needed).

Current Behavior

In org.springframework.security.web.server.ServerHttpBasicAuthenticationConverter the Authorization header is parsed and turned into a UsernamePasswordAuthenticationToken. The header content after "Basic" is base64-decoded with Base64.getDecoder().decode(value) to get the raw bytes. These bytes are then converted to a String (here):

new String(base64Decode(credentials))

This uses a String constructor without specifying a charset (which, in my opinion, is always bad). Without an explicit charset it will use the platform's default charset - so, for example, utf-8 on Linux and iso-8859-1 on Windows.

Context

This is not the first time an issue with basic auth encoding is brought up (see e. g. #2969, but that is for BasicAuthenticationFilter to first use iso-8859-1, then utf-8, instead of utf-8 only, funnily), but I haven't found one for ServerHttpBasicAuthenticationConverter.

I have a workaround (disabling httpBasic and adding my own AuthenticationWebFilter), but that's really ugly and missing some features (the matcher / entryPoint stuff from org.springframework.security.config.web.server.ServerHttpSecurity.HttpBasicSpec#configure).

Notes:

org.springframework.security.web.authentication.www.BasicAuthenticationFilter uses utf-8 as a default. It is configurable via org.springframework.security.web.authentication.www.BasicAuthenticationFilter#setCredentialsCharset (although I don't know how or even if that is exposed for configuration).
org.springframework.security.web.server.ServerHttpBasicAuthenticationConverter uses the platform's default charset, unconfigurably.
org.springframework.http.HttpHeaders#setBasicAuth(java.lang.String, java.lang.String) uses iso-8859-1 as the default (in org.springframework.http.HttpHeaders#encodeBasicAuth), but there is org.springframework.http.HttpHeaders#setBasicAuth(java.lang.String, java.lang.String, java.nio.charset.Charset) to specify another charset.

Am I wrong in wanting utf-8 as a default? Is there a consensus on what should be the default in Spring (spanning clients and servers)?

/cc @jgrandja

Comment From: sjohnr

Thanks for the report @frozenice. I don't think you're wrong that this would be a good improvement, at a minimum defaulting to the charset used in the specification. Would you be interested in submitting a PR?

@rwinch would this be best placed in 5.7 or (for any possible backwards compatibility reasons) better placed in 6.0?

Comment From: frozenice

the charset used in the specification

@sjohnr You mean like in RFC 7617? 😉

this specification continues to leave the default encoding undefined

I guess we first need to define, what the change should be:

Only use utf-8? Trivial, just add it to the String constructor.
Try one charset, then the other? This was rejected previously and I also don't like it from a security perspective.
Make it configurable? I'm not really that familiar with internal Spring stuff (how everything is structured, where to expose this).

Comment From: sjohnr

@frozenice hah! Sorry, I thought the spec was clear, my mistake. I think for now, we would want to make this consistent with the servlet support, therefore use UTF-8 as the default. I'm not sure we need to make it configurable, so I'm thinking we would wait on that and open a separate request or wait for others in the community to tell us it's needed. Just because it was made configurable previously doesn't mean it was/is necessarily needed, but it could be.

Comment From: frozenice

UTF-8 as default would be my choice, too. I submitted a PR (#10911).

Comment From: sjohnr

Thanks @frozenice. I'll wait a bit to get a bit more feedback on this issue before merging. Also, I thought this bit of RFC 7617 was interesting:

B.3. Why not simply switch the default encoding to UTF-8?

There are sites in use today that default to a local character encoding scheme, such as ISO-8859-1 ([ISO-8859-1]), and expect user agents to use that encoding. Authentication on these sites will stop working if the user agent switches to a different encoding, such as UTF-8.

Note that sites might even inspect the User-Agent header field ([RFC7231], Section 5.5.3) to decide which character encoding scheme to expect from the client. Therefore, they might support UTF-8 for some user agents, but default to something else for others. User agents in the latter group will have to continue to do what they do today until the majority of these servers have been upgraded to always use UTF-8.

Based on that, it seems like this is definitely the right direction, and I'm doubting it is likely necessary to make it configurable (my opinion only at this moment). It seems this would move windows-based systems to support user agents that want to use UTF-8.

The only other pause I have at the moment is based on your comment regarding HttpHeaders#setBasicAuth defaulting to iso-8859-1. That seems like it could cause some interoperability problems among windows-based Spring applications using HTTP Basic for system-to-system authentication (e.g. oauth2 w/client_credentials). Any thoughts on that? If needed, we could reach out to the Framework team for feedback as well.

Comment From: sjohnr

Actually, scratch my last comment, probably. That really is the same situation as user agents not being upgraded.

Comment From: frozenice

Based on that, it seems like this is definitely the right direction, and I'm doubting it is likely necessary to make it configurable (my opinion only at this moment).

I think so, too.

Perhabs HttpHeaders#setBasicAuth should be switched to utf-8 as the default at the same time? I made a branch to see what the change would look like.

If someone has a resource on what clients / servers use what encoding for basic auth, I would be interested in that (the overwhelming majority probably uses utf-8, but it's only a guess).

Comment From: sjohnr

Thinking about this a bit, I believe the reason HttpHeaders#setBasicAuth in Framework defaults to iso-8859-1 is for the same reason as browsers. Since HttpHeaders will be used on the client side, changing it to UTF-8 is like a browser requiring the server to support UTF-8. If the server does not, the request will fail. However, if the client uses iso-8859-1 but the server supports UTF-8, nothing bad will really happen, it seems. The key from that snippet of the spec is:

Authentication on these sites will stop working if the user agent switches to a different encoding, such as UTF-8.

So the most important thing is making sure servers (Spring Security in this case is on the server side) support UTF-8 first.

Comment From: frozenice

on the client side, changing it to UTF-8 is like a browser requiring the server to support UTF-8. If the server does not, the request will fail.

Not necessarily.

However, if the client uses iso-8859-1 but the server supports UTF-8, nothing bad will really happen

Also not necessarily. (btw, replacing "supports" with "uses" is clearer, as it's either the one or the other, without fallback support - which was previously rejected, as I mentioned)

To set the scope more precisely: UTF-8 encoding is identical to ASCII and ISO-8859-1 encodings regarding character codes <= 127 (standard 7-bit ASCII). Everything above 127 (extended ASCII, e. g. ISO-8859-1 or UTF-8) is encoded differently in those encodings. Most common characters (alphanumeric, most normal symbols) are contained in standard ASCII. To illustrate, the symbols of, for example, my QWERTZ keyboard which are extended ASCII are §°²³´µ€ (and of course stuff like äöüß) - everything else is standard ASCII.

Now, that means: As long as someone is using only standard ASCII characters for username and password, it doesn't matter if the client or the server uses ISO-8859-1 or UTF-8, it's mix and match and works regardless. So in this regard there is no difference in switching something to UTF-8 or to ISO-8859-1. I find statements like "it will fail" a bit misleading (that includes the spec quote, it's unfortunate wording, in my opinion).

Whatever change we make will only potentially cause problems, if someone uses extended ASCII characters. I'd wager most people only use standard ASCII characters.

Again, I'd be interested in some overview on which clients and servers use what encoding (I haven't really put much time into searching for one, yet).

Comment From: sjohnr

Thanks @frozenice. I agree that "uses" is better than "supports" and "will fail" is misleading. Apologies, I was trying to stick with similar language as the spec. And thanks for sharing that ISO-8859-1 is not binary compatible with UTF-8, that wasn't something I've thought about in quite some time.

Out of curiosity, as someone with an extended ASCII keyboard, do you typically avoid using those characters in passwords or would you use them when possible? (Hopefully that's not too personal of a question 😆)

Sadly, I don't know of a resource to research client/server compatibility either.

Comment From: frozenice

someone with an extended ASCII keyboard

Wow, do normal QWERTY keyboards not have any of those symbols I mentioned and also no Alt Gr key? I never knew. 😲 That fact makes the default character set change look even better (and also that longer is better than more complex, which everybody knows, right?).

As a programmer that is well aware of encoding issues, I tend to stick to standard ASCII for password and also usernames.