Affects: 5.3.17
Hello,
I created a webflux project based on spring boot 2.6.5 with mogodb reactive.
During performance test, even if a load of 10 tps is given, cancel() is called after onComplete(). Condition is difficult to predict, but it comes out when you request about 10 tps sometime. It comes out even better with more traffic than that.
All requests were 100% successful, so I think it's a little strange that cancel() called.
This is the result of extracting the trace log.
2022-04-01 14:07:02.296 INFO 1 --- [ntLoopGroup-3-8] reactor.Mono.Peek.603 : onComplete()
2022-04-01 14:07:02.297 DEBUG 1 --- [or-http-epoll-1] r.n.http.server.HttpServerOperations : [6b365b84-95, L:/10.240.5.189:9010 - R:/10.240.2.53:33018] Decreasing pending responses, now 0
2022-04-01 14:07:02.298 DEBUG 1 --- [or-http-epoll-1] r.n.http.server.HttpServerOperations : [6b365b84-95, L:/10.240.5.189:9010 - R:/10.240.2.53:33018] Last HTTP packet was sent, terminating the channel
2022-04-01 14:07:02.298 TRACE 1 --- [or-http-epoll-1] reactor.netty.channel.ChannelOperations : [6b365b84, L:/10.240.5.189:9010 - R:/10.240.2.53:33018] Disposing ChannelOperation from a channel
java.lang.Exception: ChannelOperation terminal stack
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:465)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
2022-04-01 14:07:02.300 INFO 1 --- [or-http-epoll-1] reactor.Mono.Peek.603 : cancel()
- The above API test to query the mongodb data.
- in the case of an API that simply returns Mono.just("XXX"), it did not occur at all.
Can I get some help or hints related to this?
Comment From: poutsma
@mp911de Does this ring a bell to you?
Comment From: mp911de
That doesn't sound proper. Do you have a minimal reproducer for us? The log output doesn't indicate the origin of the cancel
signal and without further details, we won't be able to diagnose what's happening here.
Comment From: morenice
@mp911de @poutsma Thanks for the reply. It has been confirmed that it occurs sometimes when API calls as well as MongoDB queries.
And i wrote print stack trace at doOnCancel handler.
.doOnCancel(() -> new Exception().printStackTrace()));
backtrace with tracelog
2022-04-05 15:15:42.145 INFO 1 --- [ntLoopGroup-3-2] reactor.Mono.PeekFuseable.436 : | onComplete()
2022-04-05 15:15:42.146 DEBUG 1 --- [or-http-epoll-4] r.n.http.server.HttpServerOperations : [aa6cb7c3-18, L:/10.240.6.232:9010 - R:/10.240.2.53:58428] Decreasing pending responses, now 0
2022-04-05 15:15:42.146 DEBUG 1 --- [or-http-epoll-4] r.n.http.server.HttpServerOperations : [aa6cb7c3-18, L:/10.240.6.232:9010 - R:/10.240.2.53:58428] Last HTTP packet was sent, terminating the channel
2022-04-05 15:15:42.146 TRACE 1 --- [or-http-epoll-4] reactor.netty.channel.ChannelOperations : [aa6cb7c3, L:/10.240.6.232:9010 - R:/10.240.2.53:58428] Disposing ChannelOperation from a channel
java.lang.Exception: ChannelOperation terminal stack
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:465)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
2022-04-05 15:15:42.146 INFO 1 --- [or-http-epoll-4] reactor.Mono.PeekFuseable.436 : | cancel()
java.lang.Exception
at com.XXXXXXXXXXXXXXXXXX -----> printStackTrace()
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.cancel(FluxPeekFuseable.java:152)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.cancel(FluxPeekFuseable.java:159)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.cancel(FluxMapFuseable.java:174)
at reactor.core.publisher.MonoSingle$SingleSubscriber.doOnCancel(MonoSingle.java:108)
at reactor.core.publisher.Operators$MonoInnerProducerBase.cancel(Operators.java:2784)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2252)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2220)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2032)
at reactor.core.publisher.Operators.terminate(Operators.java:1240)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.cancel(MonoFlatMap.java:187)
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.cancel(FluxContextWrite.java:141)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.cancel(FluxPeekFuseable.java:159)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.cancel(FluxHide.java:157)
at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.cancel(FluxOnAssembly.java:654)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2252)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2220)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2032)
at reactor.core.publisher.Operators.terminate(Operators.java:1240)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.cancel(MonoFlatMap.java:272)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.cancel(MonoFlatMap.java:188)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.cancel(FluxPeekFuseable.java:159)
at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.cancel(FluxDoFinally.java:151)
at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.cancel(FluxOnAssembly.java:654)
at reactor.core.publisher.FluxDoOnEach$DoOnEachSubscriber.cancel(FluxDoOnEach.java:113)
at reactor.core.publisher.FluxPeek$PeekSubscriber.cancel(FluxPeek.java:153)
at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.cancel(FluxOnAssembly.java:654)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2252)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2220)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2032)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2252)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2220)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2032)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2252)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2220)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2032)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2252)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2220)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2032)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.cancel(MonoIgnoreThen.java:143)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.cancel(MonoPeekTerminal.java:144)
at reactor.core.publisher.Operators.terminate(Operators.java:1240)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:469)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
2022-04-05 15:15:42.149 DEBUG 1 --- [or-http-epoll-4] o.s.w.r.f.client.ExchangeFunctions : [3a7b3160] Cancel signal (to close connection)
Comment From: mp911de
Thanks a lot. The stack trace indicates that the connection (transport channel) was closed and therefore the cancel
signal is being propagated across all operators. Since there's no preceding exception, it looks like a regular reaction when closing a channel via HttpServerOperations
. As the terminate
operation runs on an executor thread, it might be a race condition but my insights in that area are very shallow.
In any case, the MongoDB driver isn't issuing any cancel signals here.
Comment From: morenice
@mp911de Thanks your answer.
As you said, it seems to be a normal channel closure process. If client connection was disconnected was shown "inactive" keyword in the backtrace.
but I found that cancel issue after onComplete() only occurs in VMs. For example, 1. cancel did not appear in openjdk installed with brew on MacOS. 2. cancel appeared in virtual machine on MacOS. 3. cancel did not appear in physical machine worker node on Kubernetes 4. cancel appeared in virtual machine worker node on Kubernetes
In my opinion, It seems to be affected by the VM. Have there been any similar cases?
Comment From: Ninglin
Any development on this? I'm observing the same behaviour on my service with Spring Webflux 6.0.9
Comment From: bclozel
There was a similar case in #31417 - the Reactive Streams spec does not guarantee that you can't get a CANCEL signal after a COMPLETE signal. Sometimes those signals can race in case the connection is quickly closed by the client. If a feature relies on those signals, it must take this case into account (as it was done in #31417).