Overview

A very simple Spring Boot based application locks up on the very first HTTP request with java.lang.NoClassDefFoundError present in the logs. Killing the application also causes java.lang.NoClassDefFoundError in the logs.

The matrix below outlines when the locking happens by the version and the execution environment.

Spring Boot Linux Windows OpenShift
2.5.14 Works Works Works
2.6.8 Works Works Works
2.6.9+ Works Works Locks
2.7.0 Works Works Works
2.7.1+ Works Works Locks

Normal execution

Here is how a normal execution looks.

> oc run image-test --rm -it --image=registry/angular-starter:1.0.0-SNAPSHOT --image-pull-policy="Always" --command -- /bin/sh
If you don't see a command prompt, try pressing enter.
sh-4.4$ java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar &
[1] 7
  .   ____          _            __ _ _
/\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot ::                (v2.6.8)

2022-07-24 17:11:13.850  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Starting WebApp using Java 11.0.15 on image-test with PID 7 (/app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar started by ? in /home/app)
2022-07-24 17:11:13.853  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : No active profile set, falling back to 1 default profile: "default"
2022-07-24 17:11:19.854  INFO 7 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port 8080
2022-07-24 17:11:19.863  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Started WebApp in 7.61 seconds (JVM running for 9.215)

sh-4.4$ curl -v http://localhost:8080/api/status
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET /api/status HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/hal+json
< Content-Length: 86
< 
* Connection #0 to host localhost left intact
{"_links":{"self":{"href":"/api/status"}},"apiVersion":"1.0.0-SNAPSHOT","status":"UP"}
sh-4.4$ kill %1
[1]+  Done(143)               java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar

Stuck execution

Here is how the stuck request looks. I had to press ^C to abort the curl run.

> oc run image-test --rm -it --image=registry/angular-starter:1.0.0-SNAPSHOT --image-pull-policy="Always" --command -- /bin/sh
If you don't see a command prompt, try pressing enter.
sh-4.4$ java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar &
[1] 7
  .   ____          _            __ _ _
/\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot ::                (v2.6.9)

2022-07-24 16:53:02.884  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Starting WebApp using Java 11.0.15 on image-test with PID 7 (/app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar started by ? in /home/app)
2022-07-24 16:53:02.887  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : No active profile set, falling back to 1 default profile: "default"
2022-07-24 16:53:09.586  INFO 7 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port 8080
2022-07-24 16:53:09.679  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Started WebApp in 8.394 seconds (JVM running for 9.951)

sh-4.4$ curl -v http://localhost:8080/api/status
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET /api/status HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.61.1
> Accept: */*
> 
Exception in thread "reactor-http-epoll-2" java.lang.NoClassDefFoundError: io/netty/buffer/PoolArena$1
        at io.netty.buffer.PoolArena.freeChunk(PoolArena.java:248)
        at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:432)
        at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:396)
        at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:388)
        at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:254)
        at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:245)
        at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:218)
        at io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache.onRemoval(PooledByteBufAllocator.java:542)
        at io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache.onRemoval(PooledByteBufAllocator.java:503)
        at io.netty.util.concurrent.FastThreadLocal.remove(FastThreadLocal.java:257)
        at io.netty.util.concurrent.FastThreadLocal.removeAll(FastThreadLocal.java:67)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:1050)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: io.netty.buffer.PoolArena$1
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
        at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:151)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 15 more
^C
sh-4.4$ kill %1
Exception in thread "SpringApplicationShutdownHook" java.lang.NoClassDefFoundError: ch/qos/logback/classic/spi/ThrowableProxy
        at ch.qos.logback.classic.spi.LoggingEvent.<init>(LoggingEvent.java:119)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:419)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
        at ch.qos.logback.classic.Logger.log(Logger.java:765)
        at org.apache.commons.logging.LogAdapter$Slf4jLocationAwareLog.warn(LogAdapter.java:447)
        at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1070)
        at org.springframework.boot.web.reactive.context.ReactiveWebServerApplicationContext.doClose(ReactiveWebServerApplicationContext.java:147)
        at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021)
        at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
        at java.base/java.lang.Thread.run(Thread.java:829)

[1]+  Done(143)               java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar

This particular application uses WebFlux to serve the request. I tried to convert it to WebMVC. It also gets stuck with java.lang.NoClassDefFoundError albeit in a different class.

I did a quick diff between 2.6.8 and 2.6.9. I see some changes in org.springframework.boot.loader.jar package that may be relevant here. I'm referring to this commit. For what it's worth.

Comment From: wilkinsona

Thanks for the report. Unfortunately, I think we'll need to be able to reproduce the problem, ideally without running on Openshift, to diagnose it. Can you please provide a complete yet minimal sample that does so? Unless this is a problem that's specific to OpenShift, it should be possible to reproduce the problem by running the same image locally.

Comment From: ayashkov

I'll see what I can do. This may take a few days due to the fact that in my workplace the source code can only move in one direction: in.

Comment From: MrWong99

Had the same issue after upgrading from 2.6.x to 2.7.2. After I downgraded to 2.7.0 it works fine again. I am however not running OpenShift but instead only on Docker (20.10.17).

Comment From: snicoll

@MrWong99 if you can share a sample that we can run that would be much appreciated.

Comment From: MrWong99

I tried to extract the crucial parts without leaking any company code. It is however not tested in that format. Our app has a REST interface, includes a static website and has a small websocket endpoint.

pom.xml

...
    <properties>
        <springboot.version>2.7.2</springboot.version>
    </properties>

    <dependencyManagement>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-dependencies</artifactId>
                <version>${springboot.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-logging</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-vault-config</artifactId>
            <version>3.1.1</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-websocket</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-jdbc</artifactId>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <version>${springboot.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>repackage</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <mainClass>my.main.Application</mainClass>
                    <layout>ZIP</layout>
                </configuration>
            </plugin>
        </plugins>
    </build>

Dockerfile

# Note: we don't use this image but a similar one
FROM adoptopenjdk:11.0.11_9-jdk-hotspot

ADD my-distribution-containing-app.tar.gz .
EXPOSE 8080

CMD [ "java", "-Xmx500m", "-Dloader.path=//custom/plugins/", "-jar", "app.jar", "--spring.config.import=application.yml" ]

The application.yml has nothing special but some settings for spring.cloud.vault and active profiles.

Comment From: bclozel

@MrWong99 We're going to need an actual sample for this. I don't see anything that could cause that behavior in the code snippets you've shared.

Starting from a sample application created from start.spring.io and adding relevant bits until you reproduce the problem should give us the information we need. Thanks!

Comment From: ayashkov

I'm doing essentially this: https://github.com/ayashkov/spring-boot-31853. Several things though:

  1. I could not make this specific code fail. Still trying to modify it to trigger the failure.
  2. The whole thing seems to be extremely sensitive to the composition of the final JAR. E.g. when I try to make certain changes to the code that fails, e.g. consolidating or renaming modules, removing HAL processing, it stops failing.

I will keep working on it for a few days more, hopefully with better outcome.

Comment From: ayashkov

Here is an additional piece of data that may be relevant here. I was able to stop the breakage of the original code by removing <layout>ZIP</layout> in the following spring-boot-maven-plugin plugin configuration:

<configuration>
    <mainClass>com.baml.starter.angular.web.WebApp</mainClass>
    <layout>ZIP</layout>
    <classifier>exec</classifier>
</configuration>

In other words, the following works just fine:

<configuration>
    <mainClass>com.baml.starter.angular.web.WebApp</mainClass>
    <classifier>exec</classifier>
</configuration>

Comment From: ayashkov

The issue is reproducible in Docker. Please see https://github.com/ayashkov/spring-boot-31853.

Comment From: philwebb

Digging into this a bit today and the fix for #29356 has caused the regression. The NoClassDefFoundError is actually caused by JarFile.ensureOpen() throwing an IllegalStateException.

It looks like java.util.zip.ZipFile$CleanableResource$FinalizableResource is actually closing the Jar which makes things quite hard to debug. I'm wondering if calling the close() on all the nestedJars is actually correct. It seems likely to me that an outer JarFile might be GC'd before the nested one is finished with.

Comment From: philwebb

@ayashkov Thanks so much for putting the sample application together, it was really helpful!

Comment From: philwebb

I've pushed something that I hope will fix this but I'm not totally confident. If it doesn't work we may well need to revert the fix for #29356.

We're planning a release on Thursday so for anyone watching this issue, once CI build 1110 finishes, please try the latest SNAPSHOT to see if you still have the same problem.

Comment From: philwebb

Unfortunately we're now facing file locking issues on Windows again. I'm going to revert the fix for #29356