Following #23827, default charset where defined for a base logback configuration in 5e26954, as ${CONSOLE_LOG_CHARSET:-default} and ${FILE_LOG_CHARSET:-default}.

Unfortunately, logback interprets that as Charset.forName("default"), which taking a look at sun.nio.cs.StandardCharsets$Aliases is hardcoded to return ASCII. I feel like Charset.defaultCharset() would be a more natural choice.

Comment From: wilkinsona

Thanks for the report, @gbaso. Can you please provide a bit more information about when this has caused a problem? LoggingSystemProperties should set the FILE_LOG_CHARSET and CONSOLE_LOG_CHARSET system properties so I'd like to understand exactly when you're seeing the fallback to default occurring.

Comment From: gbaso

It seems I was mistaken, and my issue was actually related to https://github.com/openjdk/jdk/pull/4733#issuecomment-879856229.

I do, however, still think that spring boot should take a more opinionated approach than defaulting to the "default" charset (which, as I said, is ASCII), using UTF-8 instead.

Comment From: wilkinsona

@gbaso When the system properties are in effect, it should already default to UTF-8. To make sure that we fully understand the problem I'd like to know more about your situation and why that apparently wasn't happening. Can you please provide a minimal example that reproduces the problem that you're having?

Comment From: gbaso

@wilkinsona I apologize if I was not clear enough, spring boot is working correctly, it was a problem on my end (and in the mechanism the JVM sets the default charset).

However by reading the code, specifically https://github.com/spring-projects/spring-boot/blob/main/spring-boot-project/spring-boot/src/main/resources/org/springframework/boot/logging/logback/defaults.xml, it looks like if for some reasons FILE_LOG_CHARSET and CONSOLE_LOG_CHARSET are unset, they default to the string default:

<property name="FILE_LOG_CHARSET" value="${FILE_LOG_CHARSET:-default}"/>

which is interpreted as Charset.forName("default") and is an alias for ASCII.

Given that even OpenJDK is moving to UTF-8 by default, I think we should replace value="${FILE_LOG_CHARSET:-default}" with value="${FILE_LOG_CHARSET:-UTF-8}".

Comment From: wilkinsona

it looks like if for some reasons FILE_LOG_CHARSET and CONSOLE_LOG_CHARSET are unset, they default to the string default

Yes, that's right. What I'd like to understand is why, in your case, those system properties were unset. Can you please provide a minimal example that reproduces that behaviour?

Comment From: gbaso

What I'd like to understand is why, in your case, those system properties were unset

They weren't, I misattributed the source of my issue

Comment From: liuanxin

I am a developer from China, US-ASCII encoding will cause Chinese garbled, below is my code

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:    10
Codename:   buster

$ java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-post-Debian-1deb10u1)
OpenJDK 64-Bit Server VM (build 11.0.11+9-post-Debian-1deb10u1, mixed mode, sharing)

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <!-- This problem occurs from 2.4.0 ~ 2.5.6, version 2.3.12.RELEASE is normal -->
        <version>2.5.6</version>
        <relativePath/>
    </parent>

    <groupId>org.example</groupId>
    <artifactId>test-charset</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <java.version>11</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
    </dependencies>
</project>

logback.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <include resource="org/springframework/boot/logging/logback/defaults.xml" />
    <property name="CONSOLE_LOG_PATTERN" value="[%d] [%t\\(%logger\\) : %p] %class.%method\\(%file:%line\\)%n%m%n"/>
    <!--<property name="CONSOLE_LOG_CHARSET" value="UTF-8"/>-->
    <include resource="org/springframework/boot/logging/logback/console-appender.xml" />

    <root level="debug">
        <appender-ref ref="CONSOLE"/>
    </root>
</configuration>

Test

import lombok.extern.slf4j.Slf4j;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Slf4j
public class Nil {

    private static final Logger LOG = LoggerFactory.getLogger(Nil.class);

    public static void main(String[] args) {
        log.info("中文 abc");
        LOG.info("abc 中文");
    }
}

output

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:13)
?? abc

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:14)
abc ??

Delete logback.xml, or unpack the commented content <property name="CONSOLE_LOG_CHARSET" value="UTF-8"/>, the problem of garbled characters will disappear.

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:13)
中文 abc

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:14)
abc 中文

ch.qos.logback.core.encoder.LayoutWrappingEncoder#convertToByte, when the value of charset is US-ASCII, it causes garbled characters, normal if it is not set or set to UTF8

Comment From: wilkinsona

@liuanxin You aren't using Spring Boot's logging system as your main method isn't doing anything with Spring Boot. As a result, CONSOLE_LOG_CHARSET hasn't been set so ASCII is being used. This is the situation that @gbaso described above.

If you want to include Spring Boot's Logback configuration, you should make sure that you're also using Spring Boot's logging system.

Comment From: wilkinsona

Given that even OpenJDK is moving to UTF-8 by default, I think we should replace value="${FILE_LOG_CHARSET:-default}" with value="${FILE_LOG_CHARSET:-UTF-8}".

Thanks for the suggestion, @gbaso. The switch to UTF-8 is coming in Java 18. Spring Boot 3 will require Java 17 so it won't be until Spring Boot 4 when we raise the baseline beyond 17 that every supported version of Java defaults to UTF-8. As such, I think it's probably a bit early for us to make that switch. I'll flag this for team attention to see what everyone else thinks.

Comment From: wilkinsona

While I don't think we can use Charset.defaultCharset().name() in our defaults.xml, I think we can simulate its behaviour. The JDK uses the file.encoding system property to determine the default charset and, just in case it's not set, it then falls back to UTF-8 as a last resort.