Spring Boot 3.2.0 Java 17.0.8.1

Client aborting a StreamingResponseBody download in combination with an ExceptionHandler can cause an infinite loop of these exceptions (once every second):

java.lang.NullPointerException: Cannot invoke "org.apache.catalina.core.AsyncContextImpl.timeout()" because "asyncConImpl" is null
    at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:135) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:243) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:57) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

There seems to be some sort of race condition as this doesn't happen if you "wait" briefly after catching the ClientAbortException.

What should happen (or actually happens when we wait) ClientAbortException -> "clean up ot the request" -> request ends

What actually happens (when we don't wait) is something like this: ClientAbortException -> "clean up" starts -> ExceptionHandler get's called which forwards to the original URL again -> new request comes in -> "clean up" ends (=> asyncConImpl is set to null) -> request never ends -> timeout -> NPEs

Why is the ExceptionHandler called at all? Does this make sense when we use a StreamingResponseBody and encounter a ClientAbortEexception?

Steps to reproduce:

  • start the application (see code below)
  • call http://localhost:8080/download?wait=false
  • abort the download in the browser
  • wait 20 seconds (= spring.mvc.async.request-timeout) an watch the logs (see below)
  • (if you call http://localhost:8080/download?wait=true this does not happen)

Code:

package com.example.coyote;

import org.apache.catalina.connector.ClientAbortException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.HttpHeaders;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBody;

import java.nio.charset.StandardCharsets;

@SpringBootApplication
@RestController
@RestControllerAdvice
@Configuration
public class CoyoteApplication {

    private static final Logger LOG = LoggerFactory.getLogger(CoyoteApplication.class);

    public static void main(String[] args) {
        SpringApplication.run(CoyoteApplication.class, args);
    }

    @GetMapping(value = "/download")
    public ResponseEntity<StreamingResponseBody> download(@RequestParam(value = "wait", defaultValue = "false") boolean wait) {

        LOG.info("wait: {}", wait);

        var data = "test".getBytes(StandardCharsets.UTF_8);
        var times = 10;

        StreamingResponseBody responseBody = outputStream -> {
            LOG.info("start download");
            try {
                for (int i = 0; i < times; i++) {
                    outputStream.write(data);
                    outputStream.flush();
                    waitMillis(1000);
                }
            } catch (Exception e) {

                // this might already take too long to reproduce the behavior (see comment below)
                LOG.error("error");

                // if we rethrow the exception fast enough, the exception handler is called,
                // and we end up in a very strange state
                // (just wait 20 seconds - spring.mvc.async.request-timeout - to see it in the logs)
                // note: if not happening, remove the error log from above, as it might already take too much time
                if (!wait) {
                    throw e;
                }

                // if we wait, the exception handler is not called and everything is fine
                var before = Thread.currentThread().isInterrupted() ? "interrupted" : "running";
                waitMillis(100);
                var after = Thread.currentThread().isInterrupted() ? "interrupted" : "running";
                LOG.info("waited: {} -> {}", before, after);

                throw e;
            }
            LOG.info("finish download");
        };

        return ResponseEntity.ok()
                .header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=file.txt")
                .header(HttpHeaders.CONTENT_LENGTH, String.valueOf(data.length * times))
                .header(HttpHeaders.CONTENT_TYPE, "plain/txt")
                .body(responseBody);
    }

    @ExceptionHandler({ClientAbortException.class})
    public void handleClientAbortException() {
        LOG.info("CLIENT ABORT");
    }

    private static void waitMillis(long millis) {
        long now = System.currentTimeMillis();
        while (System.currentTimeMillis() - now < millis) ;
    }
}

application.properties:

spring.mvc.async.request-timeout=20000ms

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.0</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.example</groupId>
    <artifactId>coyote</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>coyote</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>17</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <image>
                        <builder>paketobuildpacks/builder-jammy-base:latest</builder>
                    </image>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

Logs when not waiting:

2024-01-02T10:37:32.455+01:00  INFO 12788 --- [nio-8080-exec-1] com.example.coyote.CoyoteApplication     : wait: false
2024-01-02T10:37:32.469+01:00  INFO 12788 --- [         task-1] com.example.coyote.CoyoteApplication     : start download
2024-01-02T10:37:36.471+01:00 ERROR 12788 --- [         task-1] com.example.coyote.CoyoteApplication     : error
2024-01-02T10:37:36.481+01:00  INFO 12788 --- [nio-8080-exec-2] com.example.coyote.CoyoteApplication     : CLIENT ABORT
2024-01-02T10:37:36.484+01:00  INFO 12788 --- [nio-8080-exec-2] com.example.coyote.CoyoteApplication     : wait: false
2024-01-02T10:37:36.485+01:00  INFO 12788 --- [         task-2] com.example.coyote.CoyoteApplication     : start download
2024-01-02T10:37:36.485+01:00 ERROR 12788 --- [         task-2] com.example.coyote.CoyoteApplication     : error
2024-01-02T10:37:57.090+01:00 ERROR 12788 --- [nio-8080-exec-3] o.a.catalina.connector.CoyoteAdapter     : Exception while processing an asynchronous request

java.lang.NullPointerException: Cannot invoke "org.apache.catalina.core.AsyncContextImpl.timeout()" because "asyncConImpl" is null
    at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:135) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:243) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:57) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

2024-01-02T10:37:58.100+01:00 ERROR 12788 --- [nio-8080-exec-4] o.a.catalina.connector.CoyoteAdapter     : Exception while processing an asynchronous request

java.lang.NullPointerException: Cannot invoke "org.apache.catalina.core.AsyncContextImpl.timeout()" because "asyncConImpl" is null
    at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:135) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:243) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:57) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

2024-01-02T10:37:59.094+01:00 ERROR 12788 --- [nio-8080-exec-5] o.a.catalina.connector.CoyoteAdapter     : Exception while processing an asynchronous request
...

Logs when waiting:

2024-01-02T10:41:04.131+01:00  INFO 7688 --- [nio-8080-exec-1] com.example.coyote.CoyoteApplication     : wait: true
2024-01-02T10:41:04.145+01:00  INFO 7688 --- [         task-1] com.example.coyote.CoyoteApplication     : start download
2024-01-02T10:41:08.146+01:00 ERROR 7688 --- [         task-1] com.example.coyote.CoyoteApplication     : error
2024-01-02T10:41:08.247+01:00  INFO 7688 --- [         task-1] com.example.coyote.CoyoteApplication     : waited: running -> interrupted

(Issue originally posted here https://github.com/spring-projects/spring-boot/issues/38950)

Comment From: KazzmanK

We have same issues with infinite loop of exceptions with StreamingResponse. Also, if there are more root issues, there will be more exceptions at 1 second rate. On our test site it looks like Spring ClientAbortException from StreamingResponseBody can cause infinite loop of dispatches

But more investigation bring us to conclusion what added extra waiting does not help in our case. Also, going back to Boot 3.1.3 helps, but to Boot 3.1.7 - exception at 1 second rate

Comment From: rstoyanchev

Thanks for the report. I am unable to reproduce the behavior, and therefore cannot narrow the exact cause, but I have an idea of what's going on.

For any async request, Spring MVC saves the result in WebAsyncManager#concurrentResult and performs an ASYNC dispatch to get back to a Servlet container thread to complete processing. If the result was an exception, then we do exception handling as for any request.

StreamingResponseBody is a bit different as an async handler in that it is more low level and writes directly to the response. When an IO error occurs, Spring MVC handles it as any other, but Tomcat is also aware of it and notifies AsyncListeners of which we have one in order to detect and handle IO errors. So there is a race between two threads trying to do exactly the same, but the method they call setConcurrentResultAndDispatch is protected so only one should perform an async dispatch.

What I think happens is, in some cases the controller thread performs the async dispatch quicker. After that when the Tomcat thread is done calling AsyncListeners, and if none did anything to complete the async request, it tries to take further action explained by Mark in this comment. However, I can't confirm what exactly happens at that point (an error dispatch, or something unexpected due to the ASYNC dispatch that already completed) because I can't reproduce the issue.

It's possible there may be room for Tomcat to handle this better, avoiding the NullPointerException and the continuous loop of dispatches, /cc @markt-asf.

That said I can make an improvement on our side so that an exception implies a disconnected client, we don't attempt to handle it any further, knowing the Servlet container will be doing the same. I'll let you know when there is a snapshot to try the change with in order to confirm whether it helps or if something more is needed.

Comment From: markt-asf

I'll see if I can re-create this as well.

Comment From: rstoyanchev

@straubp and @KazzmanK please give 6.1.4-SNAPSHOT a try to see what the behavior is now after this commit 9230a7db16e0e63507cd6ee7fcc88e26438363db.

Comment From: KazzmanK

Is it sufficient dependency? Couse boot 3.3.0-snapshot points to spring 6.1.3, and needs overrides.

[INFO] +- org.springframework:spring-web:jar:6.1.4-SNAPSHOT:compile
[INFO] |  +- org.springframework:spring-beans:jar:6.1.3:compile
[INFO] |  \- org.springframework:spring-core:jar:6.1.3:compile

Comment From: straubp

@rstoyanchev I can confirm that the fix is working. Thank you! :-)

Using this in my POM fixed the issue:

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework</groupId>
                <artifactId>spring-web</artifactId>
                <version>6.1.4-SNAPSHOT</version>
            </dependency>
        </dependencies>
    </dependencyManagement>

With the previous version the issue still exists:

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework</groupId>
                <artifactId>spring-web</artifactId>
                <version>6.1.3</version>
            </dependency>
        </dependencies>
    </dependencyManagement>

Logs using the fixed version:

2024-01-24T08:39:00.413+01:00  INFO 15828 --- [nio-8080-exec-1] com.example.coyote.CoyoteApplication     : wait: false
2024-01-24T08:39:00.429+01:00  INFO 15828 --- [         task-1] com.example.coyote.CoyoteApplication     : start download
2024-01-24T08:39:04.431+01:00 ERROR 15828 --- [         task-1] com.example.coyote.CoyoteApplication     : error
2024-01-24T08:39:04.444+01:00  INFO 15828 --- [nio-8080-exec-2] com.example.coyote.CoyoteApplication     : CLIENT ABORT

Comment From: rstoyanchev

Thanks for confirming @straubp.

Comment From: markt-asf

I am able to reproduce this issue sometimes and I have identified what is happening internally in Tomcat.

I agree that it is the concurrent error handling by both the app (on a non-container thread) and the container (on a container thread) where things start to go wrong.

Note that there is work ongoing in the Servlet project to clarify expectations around error handling but I don't think it is going to help in this case.

I'm currently looking at possible fixes to Tomcat to better handle the results of concurrent error handling but I did wonder about the following change at the framework level: If the framework intends to respond to an error in an non-container thread and that response includes calling complete() or dispatch() then the framework should not allow AsyncListener.onError() (which will be called from a different, container thread) to return to the container until complete() or dispatch() has been called in the non-container thread.

Comment From: rstoyanchev

Good that you were able to reproduce it.

If the framework intends to respond to an error in an non-container thread and that response includes calling complete() or dispatch() then the framework should not allow AsyncListener.onError() (which will be called from a different, container thread) to return to the container until complete() or dispatch() has been called in the non-container thread.

The case of a controller method returning StreamingResponseBody is a bit unique as mentioned earlier as it's doing low level streaming directly to the response, and propagates an exception that Tomcat is also aware of. In the typical case, however, we would get a result value from async processing first, then we would async dispatch and write to the response from there as we would for regular requests.

More generally though, regardless of how a controller method completes async handling, i.e. with success (producing a result value), or with an error of its own, there is an inherent race between us trying to then perform an async dispatch to continue and finish processing, and a potential error notification coming from the container side via AsyncListener.

We could introduce something to guard against this so that if the async dispatch gets there first, it is allowed to dispatch before our AsyncListener#onError returns. Or otherwise, if the onError gets there first, we don't proceed with the async dispatch.

That said I'm wondering if the Servlet container could help prevent this race from its side, or otherwise any other Servlet application (not using our support) would also need to guard against this. If I recall there is a similar rule for when an async dispatch occurs before the REQUEST dispatch that started async processing has returned, but I'm not sure if that is comparable or not.

Comment From: markt-asf

I've added some protection against the original error (will be in the February release round for all currently supported Tomcat versions) and I am looking to see if there is anything I can do more generally.

Comment From: markt-asf

I've applied a more general fix. It too will be in the February release round. The fix prevents application threads from calling methods on the AsyncContext once AsyncListener.onError() has returned to the container.

Comment From: rstoyanchev

Thanks for the changes, @markt-asf.

Comment From: KazzmanK

@straubp and @KazzmanK please give 6.1.4-SNAPSHOT a try to see what the behavior is now after this commit 9230a7d.

As of recently released spring-boot 3.2.3 based on spring 6.1.4 issue looks resolved.