Spring Cancellation behaviour when input client requests are terminated prematurely

Affects: 5.2.8.RELEASE

Spring setup

Consider the following WebFlux handler function:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.core.io.buffer.DataBuffer;
import org.springframework.stereotype.Component;
import org.springframework.web.reactive.function.BodyExtractors;
import org.springframework.web.reactive.function.server.ServerRequest;
import org.springframework.web.reactive.function.server.ServerResponse;

import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

@Component
public class Handler {
    private static final Logger LOGGER = LoggerFactory.getLogger(Handler.class);

    public Mono<ServerResponse> handle(ServerRequest serverRequest) {
        Flux<DataBuffer> inputBuffers = serverRequest.body(BodyExtractors.toDataBuffers())
                .cache()
                .map(d -> d.slice(0, d.readableByteCount()));

        // Fork off processing using inputBuffers.
        inputBuffers. ...

        return inputBuffers
                .doOnNext(dataBuffer -> LOGGER.info("Received input data buffer: {}", dataBuffer))
                .doOnCancel(() -> LOGGER.info("Pipeline cancelled"))
                .then(ServerResponse.accepted().build());
    }
}

The idea is that a 202 Accepted response is returned as soon as the input body is fully received, but subsequent processing is forked off in parallel reactive pipelines. The cache and slice calls are simply there to allow reuse of the input buffers.

Observations

I'm investigating behaviour when the input client requests are malformed or incomplete, for example: * the connection is terminated prematurely, with a number of received bytes smaller than the Content-Length header value. * the connection is terminated prematurely, with no final 0 sent when performing a Transfer-Encoding: chunked request.

When one of the above happens, a number of data buffers are received, but the main reactive pipeline in the handle method is then cancelled, without any error.

From what I can tell, this behaviour has several downsides: * this is quite unintuitive: if something is wrong, errors are generally raised rather than processing being silently stopped. It took me quite a bit of time to realise that a cancellation event was being issued. * there is no indication as to why the cancellation occurred, as doOnCancel is a simple runnable without any exception or other object provided. I'm guessing cancellation could occur for a number of reasons, not necessarily truncated client requests as described above. This makes it challenging to build any custom logging or error handling in an application. * unless I'm missing something, no metrics seem to be reported, in particular through Micrometer's built in http_server_requests meter. Defining custom metrics on the reactive pipelines within the handle method wouldn't work either, as I believe Micrometer only reports on successful completion or on error, not on cancellation. This lack of reporting means that a rogue client could be hammering an application without any visibility from the external world, which is not great in terms of security. * application code will likely be more complex. In the use case presented above where inputBuffers are processed asynchronously, the application needs to keep track of all created subscriptions and cancel them in a doOnCancel call in the main handle pipeline. Any other pipeline based off inputBuffers will otherwise receive a successful onComplete event, without realising that the input is actually incomplete.

I'm not sure what it would involve and whether it's even possible, but it seems like issuing an error in the pipeline rather than cancelling it would solve much of the above.

What are people's thoughts? Is there anything I've misunderstood with the current behaviour?

Thanks for reading!

Comment From: poutsma

It looks like you have some sample code that reproduces the problem. Could you make that available to us? i.e. something that we can unzip or git clone, build, and run.

Comment From: PyvesB

Here you go: https://github.com/PyvesB/spring-content-length-sample

The client code to hit the Spring server is written in Ruby. If that's inconvenient, you can also use the Telnet command line utility (telnet 127.0.0.1 8080 to start the session) and craft a partial HTTP request, for example along the following lines:

POST /data HTTP/1.1
Transfer-Encoding: chunked

10
abcde

Then terminate the Telnet session (generally with key combination Ctrl + ] and typing quit).

Let me know whether you can get things to run!

Comment From: poutsma

The cancelation you are seeing is the consequence of the connection being closed, which happens because of Netty's ReadTimeoutHandler timing out and closing the connection. From Spring's perspective, it's possible for us to distinguish between this way of closing a connection vs. any other, "normal" way of closing a connection. Not unless we add fragile, and possibly exploitable heuristics that count content-length, and I am not sure that is worth it.

Comment From: PyvesB

The cancelation you are seeing is the consequence of the connection being closed, which happens because of Netty's ReadTimeoutHandler timing out and closing the connection

This is surprising to me. If the client is closing the connection, why would this cause a read timeout on the server side? Additionally, in the linked example the closure happens within a second, which is significantly smaller than any common default timeouts. I had a go at adding a Netty channel handler that logs on TimeoutException and also debugged the application with breakpoints in the ReadTimeoutHandler class, I don't see any evidence of any Netty timeout happening. Am I missing something?

Comment From: rstoyanchev

@PyvesB thanks for the sample and apologies for the delayed response. I've created https://github.com/reactor/reactor-netty/issues/1512 since this would have to be addressed at that level.