Affects: Spring: 6.0.2 spring-boot: 3.0.0
I have noticed that spring-webflux
application with endpoint that returns Mono<T>
on WebFilter
invocation sometimes receives already canceled Mono
. I was able to notice this behaviour only when controller performs some async operation like IO
or delayElement
. Also I was able to reproduce only when controller returns Mono<T>
but not Flux<T>
.
I have found issue that express similar problem though it is marked as fixed : https://github.com/spring-projects/spring-framework/issues/22952
I have created demo app that help to detect this behaviour:
https://github.com/zygisx/webflux-canceled-mono
Simple WebFilter logs whenever it gets already canceled Mono
Test that performs many requests to running server (have to start up Demo application before running test). Test shows that all requests from client perspective completed successfully and was not canceled prior.
In production we are using WebFiler to record request metrics, and ±10% of our request falls into this trap, where mono is canceled. This makes us blind whether request was actually canceled by client or its caused by underlying issue in spring-framework ?
Comment From: bclozel
This looks quite similar to spring-projects/spring-boot#33300 Did you get the stacktrace when this happens (see this comment)?
Comment From: zygisx
Yes, I have stacktrace:
java.lang.Exception
at com.example.demo.Filter.lambda$filterInternal$0(Filter.java:26)
at reactor.core.publisher.FluxPeek$PeekSubscriber.cancel(FluxPeek.java:146)
at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.cancel(FluxOnAssembly.java:654)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2362)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2330)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2142)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.cancel(MonoIgnoreThen.java:143)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
at reactor.core.publisher.Operators.terminate(Operators.java:1240)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:476)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
Comment From: bclozel
Sorry for the delayed response.
Relying on a WebFilter
for observation purposes can be quite challenging. When a cancellation happens at this level, the web framework should not be the responsible here. Depending on how other filters are implemented and ordered, you are most likely facing a network behavior.
In this case, the response is probably written to the client but the client may be quicker in closing the connection - meaning we'll get a cancellation signal from the HTTP layer (here, Reactor Netty). Both "complete" and "cancel" signals might race here. Unlike other linked issues, the cancellation signal does not come from the Framework.
In this case, we think that the WebFilter
implementation should account for this and maybe not rely entirely on the cancel signal: if a cancel signal is received, it's probably fine to check the ServerWebExchange
to see if the response was committed. If the response is not large or not streaming, there is a high chance that this is a network artifact and that the response reached the client. If the response is large, this is probably still a good tradeoff and it's up to you to decide on which side it's safer to default.
I'm closing this issue as there is nothing actionable from a WebFilter
perspective and that we are not seeing a case where a cancellation signal happens without a reason.