Affects: Spring: 6.0.2 spring-boot: 3.0.0

I have noticed that spring-webflux application with endpoint that returns Mono<T> on WebFilter invocation sometimes receives already canceled Mono. I was able to notice this behaviour only when controller performs some async operation like IO or delayElement. Also I was able to reproduce only when controller returns Mono<T> but not Flux<T>.

I have found issue that express similar problem though it is marked as fixed : https://github.com/spring-projects/spring-framework/issues/22952

I have created demo app that help to detect this behaviour: https://github.com/zygisx/webflux-canceled-mono Simple WebFilter logs whenever it gets already canceled Mono Test that performs many requests to running server (have to start up Demo application before running test). Test shows that all requests from client perspective completed successfully and was not canceled prior.

In production we are using WebFiler to record request metrics, and ±10% of our request falls into this trap, where mono is canceled. This makes us blind whether request was actually canceled by client or its caused by underlying issue in spring-framework ?

Comment From: bclozel

This looks quite similar to spring-projects/spring-boot#33300 Did you get the stacktrace when this happens (see this comment)?

Comment From: zygisx

Yes, I have stacktrace:

java.lang.Exception
    at com.example.demo.Filter.lambda$filterInternal$0(Filter.java:26)
    at reactor.core.publisher.FluxPeek$PeekSubscriber.cancel(FluxPeek.java:146)
    at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.cancel(FluxOnAssembly.java:654)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
    at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.cancel(MonoIgnoreThen.java:143)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
    at reactor.core.publisher.Operators.terminate(Operators.java:1240)
    at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:476)
    at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)

Comment From: bclozel

Sorry for the delayed response.

Relying on a WebFilter for observation purposes can be quite challenging. When a cancellation happens at this level, the web framework should not be the responsible here. Depending on how other filters are implemented and ordered, you are most likely facing a network behavior.

In this case, the response is probably written to the client but the client may be quicker in closing the connection - meaning we'll get a cancellation signal from the HTTP layer (here, Reactor Netty). Both "complete" and "cancel" signals might race here. Unlike other linked issues, the cancellation signal does not come from the Framework.

In this case, we think that the WebFilter implementation should account for this and maybe not rely entirely on the cancel signal: if a cancel signal is received, it's probably fine to check the ServerWebExchange to see if the response was committed. If the response is not large or not streaming, there is a high chance that this is a network artifact and that the response reached the client. If the response is large, this is probably still a good tradeoff and it's up to you to decide on which side it's safer to default.

I'm closing this issue as there is nothing actionable from a WebFilter perspective and that we are not seeing a case where a cancellation signal happens without a reason.