Following #23827, default charset where defined for a base logback configuration in 5e26954, as ${CONSOLE_LOG_CHARSET:-default}
and ${FILE_LOG_CHARSET:-default}
.
Unfortunately, logback interprets that as Charset.forName("default")
, which taking a look at sun.nio.cs.StandardCharsets$Aliases
is hardcoded to return ASCII. I feel like Charset.defaultCharset()
would be a more natural choice.
Comment From: wilkinsona
Thanks for the report, @gbaso. Can you please provide a bit more information about when this has caused a problem? LoggingSystemProperties
should set the FILE_LOG_CHARSET
and CONSOLE_LOG_CHARSET
system properties so I'd like to understand exactly when you're seeing the fallback to default
occurring.
Comment From: gbaso
It seems I was mistaken, and my issue was actually related to https://github.com/openjdk/jdk/pull/4733#issuecomment-879856229.
I do, however, still think that spring boot should take a more opinionated approach than defaulting to the "default" charset (which, as I said, is ASCII), using UTF-8 instead.
Comment From: wilkinsona
@gbaso When the system properties are in effect, it should already default to UTF-8. To make sure that we fully understand the problem I'd like to know more about your situation and why that apparently wasn't happening. Can you please provide a minimal example that reproduces the problem that you're having?
Comment From: gbaso
@wilkinsona I apologize if I was not clear enough, spring boot is working correctly, it was a problem on my end (and in the mechanism the JVM sets the default charset).
However by reading the code, specifically https://github.com/spring-projects/spring-boot/blob/main/spring-boot-project/spring-boot/src/main/resources/org/springframework/boot/logging/logback/defaults.xml, it looks like if for some reasons FILE_LOG_CHARSET
and CONSOLE_LOG_CHARSET
are unset, they default to the string default
:
<property name="FILE_LOG_CHARSET" value="${FILE_LOG_CHARSET:-default}"/>
which is interpreted as Charset.forName("default")
and is an alias for ASCII.
Given that even OpenJDK is moving to UTF-8 by default, I think we should replace
value="${FILE_LOG_CHARSET:-default}"
with value="${FILE_LOG_CHARSET:-UTF-8}"
.
Comment From: wilkinsona
it looks like if for some reasons FILE_LOG_CHARSET and CONSOLE_LOG_CHARSET are unset, they default to the string default
Yes, that's right. What I'd like to understand is why, in your case, those system properties were unset. Can you please provide a minimal example that reproduces that behaviour?
Comment From: gbaso
What I'd like to understand is why, in your case, those system properties were unset
They weren't, I misattributed the source of my issue
Comment From: liuanxin
I am a developer from China, US-ASCII encoding will cause Chinese garbled, below is my code
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
$ java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-post-Debian-1deb10u1)
OpenJDK 64-Bit Server VM (build 11.0.11+9-post-Debian-1deb10u1, mixed mode, sharing)
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<!-- This problem occurs from 2.4.0 ~ 2.5.6, version 2.3.12.RELEASE is normal -->
<version>2.5.6</version>
<relativePath/>
</parent>
<groupId>org.example</groupId>
<artifactId>test-charset</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<java.version>11</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
</dependencies>
</project>
logback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<include resource="org/springframework/boot/logging/logback/defaults.xml" />
<property name="CONSOLE_LOG_PATTERN" value="[%d] [%t\\(%logger\\) : %p] %class.%method\\(%file:%line\\)%n%m%n"/>
<!--<property name="CONSOLE_LOG_CHARSET" value="UTF-8"/>-->
<include resource="org/springframework/boot/logging/logback/console-appender.xml" />
<root level="debug">
<appender-ref ref="CONSOLE"/>
</root>
</configuration>
Test
import lombok.extern.slf4j.Slf4j;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@Slf4j
public class Nil {
private static final Logger LOG = LoggerFactory.getLogger(Nil.class);
public static void main(String[] args) {
log.info("中文 abc");
LOG.info("abc 中文");
}
}
output
[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:13)
?? abc
[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:14)
abc ??
Delete logback.xml
, or unpack the commented content <property name="CONSOLE_LOG_CHARSET" value="UTF-8"/>
, the problem of garbled characters will disappear.
[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:13)
中文 abc
[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:14)
abc 中文
ch.qos.logback.core.encoder.LayoutWrappingEncoder#convertToByte
, when the value of charset is US-ASCII, it causes garbled characters, normal if it is not set or set to UTF8
Comment From: wilkinsona
@liuanxin You aren't using Spring Boot's logging system as your main method isn't doing anything with Spring Boot. As a result, CONSOLE_LOG_CHARSET
hasn't been set so ASCII is being used. This is the situation that @gbaso described above.
If you want to include Spring Boot's Logback configuration, you should make sure that you're also using Spring Boot's logging system.
Comment From: wilkinsona
Given that even OpenJDK is moving to UTF-8 by default, I think we should replace
value="${FILE_LOG_CHARSET:-default}"
withvalue="${FILE_LOG_CHARSET:-UTF-8}"
.
Thanks for the suggestion, @gbaso. The switch to UTF-8 is coming in Java 18. Spring Boot 3 will require Java 17 so it won't be until Spring Boot 4 when we raise the baseline beyond 17 that every supported version of Java defaults to UTF-8. As such, I think it's probably a bit early for us to make that switch. I'll flag this for team attention to see what everyone else thinks.
Comment From: wilkinsona
While I don't think we can use Charset.defaultCharset().name()
in our defaults.xml
, I think we can simulate its behaviour. The JDK uses the file.encoding
system property to determine the default charset and, just in case it's not set, it then falls back to UTF-8
as a last resort.