Version: Spring Boot 2.3.4.RELEASE with Webflux & Metrics
Scenario: 1. Client aborts the connection before server sends a response. 2. Server request processing takes forever (server configured with read/write timeout). Server closes the connection
In both the above cases, one can see a log generated due to connection abort (client/server). However, no metrics are generated.
Project dependencies
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
</dependencies>
@SpringBootApplication
@RestController
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@GetMapping("/five")
public Mono<String> five() {
LocalDateTime start = LocalDateTime.now();
return Mono.just("start")
.delayElement(Duration.ofSeconds(5))
.then(Mono.defer(() -> Mono.just("Call took " + Duration.between(start, LocalDateTime.now()).getSeconds() + " seconds")));
}
@GetMapping("/twenty")
public Mono<String> forever() {
LocalDateTime start = LocalDateTime.now();
return Mono.just("start")
.delayElement(Duration.ofSeconds(20))
.then(Mono.defer(() -> Mono.just("Call took " + Duration.between(start, LocalDateTime.now()).getSeconds() + " seconds")));
}
@GetMapping("/fifty")
public Mono<String> loopy() {
LocalDateTime start = LocalDateTime.now();
return Mono.fromRunnable(() -> {
for (int i = 0; i < 10; i++) {
System.out.println("Iteration " + i);
try {
Thread.sleep(5_000);
} catch (InterruptedException e) {
e.printStackTrace();
System.out.println("Thread was interrupted");
}
}
}
).then(Mono.defer(() -> Mono.just("Call took " + Duration.between(start, LocalDateTime.now()).getSeconds() + " seconds")));
}
@Bean
public WebServerFactoryCustomizer serverFactoryCustomizer() {
return new NettyTimeoutCustomizer();
}
class NettyTimeoutCustomizer implements WebServerFactoryCustomizer<NettyReactiveWebServerFactory> {
@Override
public void customize(NettyReactiveWebServerFactory factory) {
int timeout = 10_000;
factory.addServerCustomizers(server -> server.tcpConfiguration(tcp ->
tcp.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, timeout)
.doOnConnection(connection ->
connection
.addHandlerLast(new WriteTimeoutHandler(timeout, MILLISECONDS))
.addHandlerLast(new ReadTimeoutHandler(timeout, MILLISECONDS))
)));
}
}
}
Use JMeter for the tests below
-
Call the /five endpoint. Abort the call before 5 seconds. Result - No metrics are generated
-
Call the /twenty endpoint. Result - After 10 seconds, Client receives connection closed - org.apache.http.NoHttpResponseException: 127.0.0.1:8080 failed to respond. No metrics are generated.
-
Call the /fifty endpoint Result - Strangely enough, after 50 seconds, the call succeeds with response "Call took 50 seconds" Metric is generated. But why is the connection not interrupted by the server ? Read/Write timeout is configured for 10 seconds.
-
Call the /fifty endpoint. Abort the call in 10 seconds. Result - The iterating/sleeping thread does not get interrupted, interal loop succeeds. Exception is noticed in log after the entire loop. Metrics look like http_server_requests_seconds_max{exception="IOException",method="GET",outcome="SUCCESS",status="200",uri="/fifty",} 50.157466
Comment From: philwebb
@cdmatta Are you able to share this code as a complete project that we can run? Either a link to a GitHub project or an attached zip would be ideal.
Comment From: cdmatta
Thanks @philwebb . I added sample code here https://github.com/cdmatta/sb-conn-interruption-issues
Added 2 modules mvc and reactive to allow comparison + JMeter script. For one test case when client closes the connection before server can reply, one needs to start the jmeter request and "Stop" the execution.
Compared against mvc stack.
When client interrupts the call
- Reactive stack - metrics are missing
- MVC Stack
http_server_requests_seconds_sum{exception="ClientAbortException",method="GET",outcome="SUCCESS",status="200",uri="/five",} 5.057541428
When server interrupts the request (processing took too long)
- Reactive Stack - when netty read/write timeout is configured. Request is processing for too long - Metrics are missing
- Reactive Stack - when using timeout on the pipeline. for ex
java Mono.just("result").delayElement(Duration.ofSeconds(20)).timeout(Duration.ofSeconds(5));
http_server_requests_seconds_sum{exception="TimeoutException",method="GET",outcome="SERVER_ERROR",status="500",uri="/twenty",} 5.127515163
- MVC Stack - with DeferredResult
http_server_requests_seconds_sum{exception="AsyncRequestTimeoutException",method="GET",outcome="SERVER_ERROR",status="503",uri="/twenty",} 11.04647488
In the Reactive stack when I add a WebFilter, I can see the cancel signal. This is not caught in the org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter. Thus the metric is missing. Though, even if one registers for that signal, one does not know if it was the client cancel or server read/write timeout.
Comment From: wilkinsona
See https://github.com/spring-projects/spring-boot/issues/18444 which improved things on the client side. @bclozel Do we need to make some similar changes on the server side?