Affects: 5.3.7
With the upgrade from spring-boot from 2.3.x to 2.4.x & reactor-netty-http from 0.9.20 to 1.0.7 client disconnected errors are causing 500 http responses to be produced (and therefore reported to logs & metrics).
My scenario is simple client -> server call where server is a Spring-boot based app with Webflux and Reactor. After the upgrade I noticed increase in http 500 responses. Logs report 500 Server Error for HTTP POST "/endpoint"
from org.springframework.web.server.adapter.HttpWebHandlerAdapter
with stack trace
reactor.netty.channel.AbortedException: Connection has been closed
at reactor.netty.http.server.HttpServerOperations.onInboundClose(HttpServerOperations.java:568)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
...
I think the culprit is change in reactor-netty
which started raising AbortedException on client connection close. However, despite AbortedException
being on the list of known exception caused by connection closed it results in HTTP 500 response (obviously, client won't even consume it) because in
HttpWebHandlerAdapter
if (response.setStatusCode(HttpStatus.INTERNAL_SERVER_ERROR)) {
logger.error(logPrefix + "500 Server Error for " + formatRequest(request), ex);
return Mono.empty();
}
else if (isDisconnectedClientError(ex)) {
if (lostClientLogger.isTraceEnabled()) {
lostClientLogger.trace(logPrefix + "Client went away", ex);
}
else if (lostClientLogger.isDebugEnabled()) {
lostClientLogger.debug(logPrefix + "Client went away: " + ex +
" (stacktrace at TRACE level for '" + DISCONNECTED_CLIENT_LOG_CATEGORY + "')");
}
return Mono.empty();
}
the first if
(setStatusCode
) returns true.
Looks like the root cause is the integration point (couldn't track it) between HttpResponse and Reactor Netty as it allows to set status code on a response that won't be returned to client (I guess state
is not updated).
Comment From: rstoyanchev
To clarify, it sounds like the scenario is WebClient called from a WebFlux server as part of handling a request and the call to the remote server does not complete?
There was this change 4edc7196fb172cabe454dfc0377d322678b7ea7f which might help to explain. From a WebFlux server perspective we consider a "disconnected client" error to be where the connection to the original client is lost. Any other connection failure from a call to a remote server is a request handling error. In other words this might be expected behavior.
Comment From: tmszdmsk
Thanks for answer @rstoyanchev!
To clarify, it sounds like the scenario is WebClient called from a WebFlux server as part of handling a request and the call to the remote server does not complete?
No, problem is observed on a server(spring-boot + webflux) in a client -> server scenario. Client closes connection to server due to timeout or whatever reason. AbortedException
is raised in server's reactor-netty stack. Server returns 500 because response.setStatusCode(HttpStatus.INTERNAL_SERVER_ERROR)
returns true
(and sets the status code). Client doesn't wait for the answer so that's not the worst problem, but this HTTP 500 pollutes logs/metrics.
My hypothesis is that the HttpResponse should change state internally when AbortedException is raised by reactor stack and it doesn't.
Comment From: russellyou
Second this. Got same problem. AbortedException is a client Exception. Should not to be categorised as 5xx errors.
Comment From: Fetsivalen
I've faced the same issue with spring cloud Gateway, sometimes it happens when the client disconnects from the server. Also, it happens when in DC happens maintenance on switches or other proxies between client and server. It would be nice to treat AbortedException as one which does not need to produce tonnes of error logs and 5xx related metrics since it is not a server error. Or make it configurable if someone wants to see it on error level.
Comment From: snicoll
Unfortunately, I don't see this issue being actionable anymore. If someone can share a small sample we can run ourselves that demonstrates a faulty behavior with a supported version, and taking into account the comment above, we can reconsider.