When I set a property in the application.properties file containing a unicode escape sequence like this
languages=English,Fran\u00e7ais
the property will be resolved at runtime (for example in a configuration class) as "English,Français". That is what I want.
When I set the same property with the same value as an environment variable (for example in a docker compose file) the unicode escape sequence is not resolved and the value at runtime will be "English,Fran\u00e7ais".
If you set the environment variable SPRING_APPLICATION_JSON to the value {"languages":"English,Fran\u00e7ais"} the unicode escape sequence will again be resolved, so that is a workaround at least.
I think that the property should not be resolved differently as an environment variable.
Comment From: wilkinsona
Why are you using unicode escape sequences in the environment variable's value? Unlike the name, there are no restrictions on the contents of an environment variable's value:
The values that the environment variables may be assigned are not restricted except that they are considered to end with a null byte and the total space used to store the environment and the arguments to the process is limited to {ARG_MAX} bytes.
Comment From: jhollmannk
"Plain text": the most complicated thing you can do on a computer :-)
I set the variable in the docker-compose.yml as
- languages=English,Français
The docker-compose.yml is in UTF-8 so 'ç' will be the byte sequence 0xC3 0xA7.
This creates the environment variable "languages" in the container and when I do "echo $languages" in an interactive shell in the container I get the exact value above (with the ç). So the bash in this container seems to resolve the UTF-8 sequence allright.
But when using the variable in the application, the UTF-8 sequence is not resolved. For example if I log the current value I get "Fran??ais" in a log.
I tried the same thing in a debugger (defining the environment variable in the run configuration) to see the value of the property and the value is "Français" so the UTF-8 sequence is read as the 2 characters 0xC3 0xA7.
I don't expect springboot to resolve UTF-8 characters in this case (don't get me wrong here). But the only resolution that I found was to use the unicode sequence \u00E7 instead for the character which works in the application.properties. That is why I wanted to use the unicode escape sequence.
So now I have no idea how to enter a value in the environment variable to get the desired result. What input will be interpreted as the desired text containing the unicode characters?
Comment From: wilkinsona
Thanks for the additional information.
I've tried setting an environment variable with UTF-8 characters in its value and it works fine when consumed via both System.getenv and Spring's Environment. That's when the JVM's default charset is UTF-8 (per Charset.defaultCharset(). If I override that using -Dfile.encoding=US-ASCII for example, the environment variable's value is then corrupted.
What encoding is your JVM using by default?
Comment From: jhollmannk
Thank you so much for analysing this so deeply! I would never have guessed that the file.encoding has anything to do with reading environment variables.
The problem came up in our docker container environment. The base image is built upon some alpine image and the file.encoding in the container is "ANSI_X3.4-1968" (so sort of US-ASCII).
That solves this issue. We can make sure that the file.encoding is set so it suits our needs either in our base image itself or by setting the encoding via -Dfile.encoding=UTF-8 in our entrypoint for the application.
Thanks again!
Comment From: wilkinsona
Thanks for letting us know you've got things working.