Viswanathan Manickam opened SPR-17610 and commented
I am getting readtimeout exception when I try to load test 1000 requests per second for 30 mins. Also, when the server sits idle for sometime and when I send request, same error occurs the very first time but further requests process fine.
The response time for some requests is around 5 sec but other requests are in 250ms.
Please find the configuration below:
@Bean
public WebClient webClient() {
TcpClient tcpClient = TcpClient.create()
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000)
.option(ChannelOption.SO_KEEPALIVE, true)
.doOnConnected { connection ->
connection.addHandlerLast(new ReadTimeoutHandler(5))
.addHandlerLast(new WriteTimeoutHandler(30))
}
ReactorClientHttpConnector httpConnector = new ReactorClientHttpConnector(HttpClient.from(tcpClient));
return WebClient.builder()
.clientConnector(httpConnector)
.build();
}
public Mono<String> retryPost(String url, JSONObject body) {
Mono<String> result = webClient.post().uri(url)
.contentType(MediaType.APPLICATION_JSON)
.accept(MediaType.APPLICATION_JSON_UTF8)
.body(BodyInserters.fromObject(body))
.exchange()
.log()
.flatMap { clientResponse ->
return handleResponse(clientResponse)
}
return result;
}
private Mono<String> handleResponse(ClientResponse clientResponse) {
if (clientResponse.statusCode().is4xxClientError() || clientResponse.statusCode().is5xxServerError()) {
return clientResponse.bodyToMono(String.class)
.flatMap { errorBody ->
return Mono.error(new CustomException(errorBody, clientResponse.statusCode().value()))
}
} else {
return clientResponse.bodyToMono(String.class).log();
}
}
Please find the log I received:
14:39:33.092 [http-nio-8080-exec-4] INFO reactor.Mono.SwitchIfEmpty.11 - onSubscribe(FluxSwitchIfEmpty.SwitchIfEmptySubscriber)
14:39:33.093 [http-nio-8080-exec-4] INFO reactor.Mono.SwitchIfEmpty.11 - request(unbounded)
14:39:38.128 [reactor-http-nio-4] ERROR r.n.http.client.HttpClientConnect - [id: 0x05f49b52, L::60850 - R:] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 - onError(io.netty.handler.timeout.ReadTimeoutException)
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 -
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.179 [http-nio-8080-exec-5] ERROR c.f.l.a.GlobalControllerExceptionHandler - null
io.netty.handler.timeout.ReadTimeoutException: null
Let me know if I miss anything
Affects: 5.1.3
Reference URL: https://stackoverflow.com/questions/53786424/springboot-webclient-throws-readtimeout-errors
Comment From: spring-projects-issues
Brian Clozel commented
Thanks for raising this issue - the logs you've shared are somewhat different, since now it seems the timeout doesn't happen right away but is triggered 5 seconds after the HTTP call. Let me track down where that timeout comes from and how we can override it.
Comment From: spring-projects-issues
Viswanathan Manickam commented
Brian Clozel As I mentioned, I have 2 issues:
1) When I run load test 1000 request per second for 30 mins, I see the following read timeouts where I don't see that 5 seconds different
13:55:50.287 [reactor-http-nio-6] INFO reactor.Mono.MapFuseable.49127 - | request(unbounded)
13:55:50.288 [reactor-http-nio-6] INFO reactor.Mono.SwitchIfEmpty.48910 - onComplete()
13:55:50.310 [reactor-http-nio-6] ERROR r.n.r.PooledConnectionProvider - [id: 0xa1c67a8f, L:/10.253.8.138:55671 - REMOTEhOST:PORT] Pooled connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
13:55:50.312 [reactor-http-nio-6] ERROR reactor.Mono.SwitchIfEmpty.48759 - onError(io.netty.handler.timeout.ReadTimeoutException)
13:55:50.312 [reactor-http-nio-6] ERROR reactor.Mono.SwitchIfEmpty.48759 -
io.netty.handler.timeout.ReadTimeoutException: null
2) When the server sits idle for sometime and when they process the first request, I get the read time out with 5 seconds difference
14:39:33.092 [http-nio-8080-exec-4] INFO reactor.Mono.SwitchIfEmpty.11 - onSubscribe(FluxSwitchIfEmpty.SwitchIfEmptySubscriber)
14:39:33.093 [http-nio-8080-exec-4] INFO reactor.Mono.SwitchIfEmpty.11 - request(unbounded)
14:39:38.128 [reactor-http-nio-4] ERROR r.n.http.client.HttpClientConnect - [id: 0x05f49b52, L::60850 - R:] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 - onError(io.netty.handler.timeout.ReadTimeoutException)
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 -
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.179 [http-nio-8080-exec-5] ERROR c.f.l.a.GlobalControllerExceptionHandler - null
io.netty.handler.timeout.ReadTimeoutException: null
Comment From: spring-projects-issues
Brian Clozel commented
If I remember correctly, the suffix number you're seeing in the logs like reactor.Mono.SwitchIfEmpty.48759
is tied to the subscriber.
In your case, this means that
13:55:50.287 [reactor-http-nio-6] INFO reactor.Mono.MapFuseable.49127 - | request(unbounded)
and
13:55:50.312 [reactor-http-nio-6] ERROR reactor.Mono.SwitchIfEmpty.48759 - onError(
are not about the same HTTP requests. Reactor is scheduling tasks on worker threads, so when you're seeing things happening sequentially on the same thread, it doesn't mean those units of work are being done for the same subscriber.
If you follow the subscriber id in the logs, you should see that the amount of time between requesting the data and not getting is about 5 seconds.
Not on that last point, I've created a Reactor HTTP server that sends the response body after 10 seconds:
HttpServer.create().host("localhost").port(8080).handle((req, res) -> {
return res.sendString(Flux.just("Hello").delayElements(Duration.ofSeconds(10)), StandardCharsets.UTF_8);
}).bindNow().onDispose().block();
And a WebClient that fetches this data:
String response = WebClient.create().get().uri("http://localhost:8080/")
.retrieve().bodyToMono(String.class).log().block();
System.out.println(response);
I'm not getting any read timeout. So something in your application must be configuring such a timeout.
I'm closing this issue for now, don't hesitate to reopen it if you manage to reproduce it and provide me with a sample project that I can run on my computer (something that I could clone from github, or even a zip file I could download).
Thanks!
Comment From: spring-projects-issues
Viswanathan Manickam commented
If you look at the second issue logs, the suffix number is 11 for all logs.
Comment From: spring-projects-issues
Brian Clozel commented
Yes, but in this case this is consistent with your configuration. You're configuring a read timeout of 5 seconds in your WebClient and the timeout is triggered after 5 seconds. Am i missing something here?
Comment From: spring-projects-issues
Viswanathan Manickam commented
Server always return the response with in 500ms, why I am getting read timeouts? Is there any block? I am trying to understand where the issue occurs.
Thanks for your time.
Comment From: spring-projects-issues
Brian Clozel commented
In the sample you're showing, this is not the case:
14:39:33.092 [http-nio-8080-exec-4] INFO reactor.Mono.SwitchIfEmpty.11 - onSubscribe(FluxSwitchIfEmpty.SwitchIfEmptySubscriber)
14:39:33.093 [http-nio-8080-exec-4] INFO reactor.Mono.SwitchIfEmpty.11 - request(unbounded)
14:39:38.128 [reactor-http-nio-4] ERROR r.n.http.client.HttpClientConnect - [id: 0x05f49b52, L::60850 - R:] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
There are more than 5 seconds between 14:39:33.093
and 14:39:38.128
.
Comment From: spring-projects-issues
Viswanathan Manickam commented
Thank you for your time Brian Clozel Really appreciate your help.
Finally, What would you recommend to avoid read timeouts
1) Increase read time out seconds?
2) Retries?
Comment From: spring-projects-issues
Brian Clozel commented
It really depends on your use case and constraints. I don't know if you're running a benchmark to optimize for throughput, if expecting this load on a regular basis, or if your app is doing some kind of batch processing, if the load is generated by hits on your own app, etc.
If fetching the responses in a timely manner is not really important and you're running some kind of batch processing, In this case, you probably have a Flux<Something>
and you want to do HTTP calls for each element. In this case, you could use the limitRate
operator and operator variants that accept a concurrency
parameter. This will prevent your application from flooding the remote service and hitting those high latencies.
Adding retries will work for the odd case, but if your app is consistently flooding the remote service, then you're not really solving the issue. But in a regular case, retries are pretty useful.
Changing the timeout configuration at the HTTP level or with timeout
operators can work, but you have to operate within your constraints. Your service should probably respond and not create too much latency for others.
In the end, you're also limited by the performance of the remote system you're querying.
Comment From: spring-projects-issues
Viswanathan Manickam commented
Brian Clozel Thank you for your comments.
For the issue (When the server sits idle for sometime and when they process the first request, I get the read time out with 5 seconds difference), I did try increase the timeout up to 30 seconds and still getting read timeout. Is thread block on reading the response and throw readtimeout exception when timeout occurs? Can you suggest anything to resolve this?
Also, I tried hitting our remote server using jmeter with same load and not getting any timeouts/exception. All requests processed successfully and return response.
Please advice.
I really appreciate your time.
Comment From: bijuvnair
I am getting same error . @Viswanathan Manickan : Is this issue resolved? If yes , how it got resolved? Appreciate any reference for the same
Comment From: phantomedc
I am getting same error .
2019-09-05 23:14:09.389 [http-nio-80-exec-2] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [e75bf24] HTTP GET http://***.com/api/v1/query?time=1567439463&query)
2019-09-05 23:14:09.389 [reactor-http-kqueue-4] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:*******/10.104.232.131:9801] Channel acquired, now 1 active connections and 0 inactive connections
2019-09-05 23:14:09.389 [reactor-http-kqueue-4] DEBUG reactor.netty.ReactorNetty - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:*******/10.104.232.131:9801] Added decoder [ReadTimeoutHandler] at the end of the user pipeline, full pipeline: [reactor.left.httpCodec, ReadTimeoutHandler, reactor.right.reactiveBridge, DefaultChannelPipeline$TailContext#0]
2019-09-05 23:14:09.389 [reactor-http-kqueue-4] DEBUG reactor.netty.http.client.HttpClientConnect - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:********/10.104.232.131:9801] Handler is being applied: {uri=http://******/api/v1/query), method=GET}
2019-09-05 23:14:09.390 [reactor-http-kqueue-4] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:*****.com/****:****] onStateChange(GET{uri=/api/v1/query?time=1567439463&query), connection=PooledConnection{channel=[id: 0x5d539d8f, L:/10.49.3.9:54769 - R:********/10.104.232.131:9801]}}, [request_sent])
2019-09-05 23:14:24.401 [reactor-http-kqueue-4] WARN reactor.netty.http.client.HttpClientConnect - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:****.com/****:**] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
Comment From: phantomedc
@bclozel
i found this issue may relates to the Reactor Connection Pool, i reproduced this problem in my dev envrionment,I did 4 requests at 2019-09-06 00:10:14.739
(t1)、2019-09-06 00:19:51.274
(t2)、2019-09-06 00:30:37.992
(t3)、2019-09-06 00:33:13.314
(t4),
the request1(at t1), request2(at t2) webclient didn't idle too long time, the requests were processed fine, but request3(at t3), webclient was idle 10minutes, webclient got Read Timeout, then i did request4 , it was processed fine as well.
and i noticed that request1、request2、request3,they used same local port: 59838
but request4 used local port:60928
so i guess that: the connection(or channel in netty) will be cached in the connection pool, reactor netty will manage and reuse it, but when the connection idle too long, the connection may be closed or cut down by remote side(F5 loadbalance or Nginx?) or some other reason, it became disconnected on the link layer, but the connection pool did not realize what happened actually, the connection pool still thinks this connection is connected and provide it to webclient, and boom: read time out.
HttpClient httpClient = HttpClient.create()
.tcpConfiguration(tcpClient -> {
tcpClient = tcpClient.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000);
tcpClient = tcpClient.doOnConnected(conn -> conn
.addHandlerLast(new ReadTimeoutHandler(15000, TimeUnit.MILLISECONDS)));
return tcpClient;
});
request1:
2019-09-06 00:10:14.768 [reactor-http-kqueue-5] DEBUG reactor.netty.http.client.HttpClientOperations - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] Received response (auto-read:false) : [Server=nginx, Date=Thu, 05 Sep 2019 16:10:15 GMT, Content-Type=application/json, Content-Length=64, Connection=keep-alive, Access-Control-Allow-Headers=Accept, Accept-Encoding, Authorization, Content-Type, Origin, Access-Control-Allow-Methods=GET, OPTIONS, Access-Control-Expose-Headers=Date, Vary=Accept-Encoding, Access-Control-Allow-Credentials=true]
2019-09-06 00:10:14.768 [reactor-http-kqueue-5] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] onStateChange(GET{uri=/api/v1/query?time=1567439463&query****), connection=PooledConnection{channel=[id: 0x0759376f, L:/10.49.3.9:59838 - R:****/****:****]}}, [response_received])
2019-09-06 00:10:14.768 [reactor-http-kqueue-5] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [1807c047] Response 200 OK
request2:
2019-09-06 00:19:51.272 [reactor-http-kqueue-5] DEBUG reactor.netty.http.client.HttpClientOperations - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] Received response (auto-read:false) : [Server=nginx, Date=Thu, 05 Sep 2019 16:19:51 GMT, Content-Type=application/json, Content-Length=64, Connection=keep-alive, Access-Control-Allow-Headers=Accept, Accept-Encoding, Authorization, Content-Type, Origin, Access-Control-Allow-Methods=GET, OPTIONS, Access-Control-Expose-Headers=Date, Vary=Accept-Encoding, Access-Control-Allow-Credentials=true]
2019-09-06 00:19:51.272 [reactor-http-kqueue-5] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] onStateChange(GET{uri=/api/v1/query?time=1567439463&query=****), connection=PooledConnection{channel=[id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/***:***]}}, [response_received])
2019-09-06 00:19:51.272 [reactor-http-kqueue-5] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [1a576a1] Response 200 OK
request3:
2019-09-06 00:30:52.998 [reactor-http-kqueue-5] WARN reactor.netty.http.client.HttpClientConnect - [id: 0x0759376f, L:/10.49.3.9:59838 - R:njyh.data.promes.cloudytrace.com/10.104.232.131:9801] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
Suppressed: java.lang.Exception: #block terminated with an error
request4:
2019-09-06 00:33:13.338 [reactor-http-kqueue-3] DEBUG reactor.netty.http.client.HttpClientOperations - [id: 0x47c1f6c3, L:/10.49.3.9:60928 - R:****.com/****:****] Received response (auto-read:false) : [Server=nginx, Date=Thu, 05 Sep 2019 16:33:13 GMT, Content-Type=application/json, Content-Length=64, Connection=keep-alive, Access-Control-Allow-Headers=Accept, Accept-Encoding, Authorization, Content-Type, Origin, Access-Control-Allow-Methods=GET, OPTIONS, Access-Control-Expose-Headers=Date, Vary=Accept-Encoding, Access-Control-Allow-Credentials=true]
2019-09-06 00:33:13.338 [reactor-http-kqueue-3] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x47c1f6c3, L:/10.49.3.9:60928 - R:****.com/****:***] onStateChange(GET{uri=/api/v1/query?time=1567439463&query=****), connection=PooledConnection{channel=[id: 0x47c1f6c3, L:/10.49.3.9:60928 - R:****.com/***:****]}}, [response_received])
2019-09-06 00:33:13.338 [reactor-http-kqueue-3] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [11c0862d] Response 200 OK
sorry for my poor english
Comment From: benweet
Using kotlin coroutines with spring boot 2.2, I had this typical issue because the netty server and the webclient were sharing the same event loop, which caused the server to hang under heavy load as all the workers were used by one or the other (only 4 threads by default if server cpu <= 4).
The fix was to use another event loop for all the webclients, keeping the main one dedicated to handling the server requests :
ReactorClientHttpConnector sharedConnector = new ReactorClientHttpConnector(
HttpClient.create().runOn(LoopResources.create("reactor-webclient")
)
Comment From: sreeraj2408
I found somewhat similar issue from spring-webflux 5.3.16 in my project. Is the root cause confirmed and fixed?
Comment From: JevgenijZubovskij-UnlikelyAI
I have encountered the same issue in Spring 6, fix from @benweet worked like a charm