Affects:
spring-boot-starter-webflux:2.4.6 spring-boot-starter-reactor-netty:2.4.6
Description
I have a small WebFlux app that, for the sake of this issue, just logs stuff when it receives a request from a HTTP endpoint. This endpoint is being hit by a load test performed by Gatling on my laptop. The app runs on a Docker container (limited to 1 CPU and 2 GB of RAM) and using 150 netty IO workers. With SSL off (plain HTTP) the container can handle 100 requests per second:
================================================================================
---- Global Information --------------------------------------------------------
> request count 90000 (OK=90000 KO=0 )
> min response time 110 (OK=110 KO=- )
> max response time 81107 (OK=81107 KO=- )
> mean response time 3562 (OK=3562 KO=- )
> std deviation 8615 (OK=8615 KO=- )
> response time 50th percentile 199 (OK=199 KO=- )
> response time 75th percentile 338 (OK=338 KO=- )
> response time 95th percentile 21932 (OK=21905 KO=- )
> response time 99th percentile 42839 (OK=42841 KO=- )
> mean requests/sec 99.889 (OK=99.889 KO=- )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms 73446 ( 82%)
> 800 ms < t < 1200 ms 650 ( 1%)
> t > 1200 ms 15904 ( 18%)
> failed 0 ( 0%)
================================================================================
If I add SSL, the load test for 100 req/s fails with a bunch of "SSLException: handshake timed out". Notice I see this error on the load test output, not on the container. I have to decrease the load to just 15 req/s to avoid the handshake errors. I know that SSL will impose some overhead but I wasn't expecting this degradation. By the way, this is my netty server configurer:
@Bean
public NettyServerCustomizer nettyServerCustomizer()
{
return httpServer -> httpServer.idleTimeout(Duration.ofMinutes(10))
.option(ChannelOption.SO_BACKLOG, 65535)
.option(ChannelOption.SO_REUSEADDR, true)
.childOption(ChannelOption.TCP_NODELAY, true)
.childOption(ChannelOption.SO_KEEPALIVE, true);
}
Please let me know if I should raise this directly on the reactor-netty project or if more details are needed.
Thanks!
Comment From: bclozel
I'm afraid we can't really improve the situation from a Spring Framework perspective.
While I don't expect a huge performance difference between the two, a lot of things could be happening here. The environment (1CPU for a JVM application is really constrained) and your configuration (why configure 150 workers here?) might have a big impact.
Also, depending on how the Gatling scenario is set (here specifically, the number of "users" and how many requests they send per session). It really matters here as if each user is creating a new connection for a single request, you're most likely benchmarking the TLS handshake and the network layer more than anything else.
Finally, there are also other aspects at play here: the source of random numbers used by your container, the amount of memory available vs. the memory used by TLS sessions and how they expire, etc.
You could also ensure that your application is using native openssl for maximum efficiency (is the library available and detected by Reactor Netty?).
I'd suggest joining the Reactor Netty Gitter channel to ask the community about their experience. As a result, I'm closing this issue. Thanks!