Affects: spring-web 5.2.3.RELEASE
Runtime Environment: AWS Corretto JDK 11.0.5.10
I am running a web server (serving an API) with Spring Webflux. Sporadically (once a day) I get an error log by the org.springframework.web.server.adapter.HttpWebHandlerAdapter class that looks like the following:
[ff862e23] Error [reactor.netty.ReactorNetty$InternalNettyException: java.nio.channels.ClosedChannelException] for HTTP GET "/some/path", but ServerHttpResponse already committed (200 OK)
After some digging around I could reproduce this error locally in the following way:
- Put a breakpoint in ReactorServerHttpResponse.applyStatusCode
- Do a curl request against one of my endpoints
- With the debugger, step over the line in the breakpoint from 1) and rather fast abort the curl request with ctrl+c
- Resume execution of the program
Not 100% of the time but often enough this will trigger the following line in netty: AbstractChannel.write.
This in turn will end up in HttpWebHandlerAdapter.handleUnresolvedError but not be detected as a client disconnected error in the isDisconnectedClientError method (since neither the message or exception class match what is there).
From the test setup it looks like this should be handled as a client disconnect error.
Since there is no easy way to suppress these error message (either by registering a handler for them or configuring the logging level separately from the generic HTTP 500 error logging) I am sadly getting alerts in my setup for errors that I can do nothing about.
I don't have a suggestion on how the isDisconnectedClientError method could be changed - including InternalNettyException in the DISCONNECTED_CLIENT_EXCEPTIONS seems dangerous. NestedExceptionUtils.getMostSpecificCause(ex).getMessage() returns null so no message check available. Potentially NestedExceptionUtils.getMostSpecificCause(ex) could be checked for java.nio.channels.ClosedChannelException?
Comment From: rstoyanchev
@violetagg we're currently detecting AbortedException for disconnected clients when running on Reactor Netty at the server. However, based on this report I now see a few places where ClosedChannelException is wrapped with InternalNettyException. Should some or all of those be AbortedException?
Comment From: rstoyanchev
@violetagg I tried after the changes for https://github.com/reactor/reactor-netty/pull/1018, following the instructions above but it doesn't make a difference.
In this sequence, the commit action that calls applyStatusCode() is deferred and so the call to response.send happens earlier as part of the write function. This should be okay since the send itself should be (and is) deferred.
Of course the channel active check could pushed further to subscription time when the headers are marked sent, but again that seems incomplete and only catches situations where the connection is closed before writing starts. The problem would remain for any time after.
This is the stack trace where the ClosedChannelException is wrapped:
"reactor-http-epoll-3@8628" daemon prio=5 tid=0x23 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at reactor.netty.FutureMono$FutureSubscription.wrapError(FutureMono.java:186)
at reactor.netty.FutureMono$FutureSubscription.operationComplete(FutureMono.java:176)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:707)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:700)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.write(CombinedChannelDuplexHandler.java:526)
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:112)
at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:707)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:700)
at reactor.netty.http.server.HttpTrafficHandler.write(HttpTrafficHandler.java:309)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1089)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:-1)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
This seems to be the result of writing. It looks to me like that should be an AbortedException. What do you think?
Comment From: violetagg
@rstoyanchev let's track Reactor Netty issue https://github.com/reactor/reactor-netty/issues/1039
Comment From: rstoyanchev
Ok, closing this for now.