Affects:

spring-boot-starter-webflux:2.4.6 spring-boot-starter-reactor-netty:2.4.6

Description

I have a small WebFlux app that, for the sake of this issue, just logs stuff when it receives a request from a HTTP endpoint. This endpoint is being hit by a load test performed by Gatling on my laptop. The app runs on a Docker container (limited to 1 CPU and 2 GB of RAM) and using 150 netty IO workers. With SSL off (plain HTTP) the container can handle 100 requests per second:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      90000 (OK=90000  KO=0     )
> min response time                                    110 (OK=110    KO=-     )
> max response time                                  81107 (OK=81107  KO=-     )
> mean response time                                  3562 (OK=3562   KO=-     )
> std deviation                                       8615 (OK=8615   KO=-     )
> response time 50th percentile                        199 (OK=199    KO=-     )
> response time 75th percentile                        338 (OK=338    KO=-     )
> response time 95th percentile                      21932 (OK=21905  KO=-     )
> response time 99th percentile                      42839 (OK=42841  KO=-     )
> mean requests/sec                                 99.889 (OK=99.889 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         73446 ( 82%)
> 800 ms < t < 1200 ms                                 650 (  1%)
> t > 1200 ms                                        15904 ( 18%)
> failed                                                 0 (  0%)
================================================================================

If I add SSL, the load test for 100 req/s fails with a bunch of "SSLException: handshake timed out". Notice I see this error on the load test output, not on the container. I have to decrease the load to just 15 req/s to avoid the handshake errors. I know that SSL will impose some overhead but I wasn't expecting this degradation. By the way, this is my netty server configurer:

    @Bean
    public NettyServerCustomizer nettyServerCustomizer()
    {
        return httpServer -> httpServer.idleTimeout(Duration.ofMinutes(10))
                                   .option(ChannelOption.SO_BACKLOG, 65535)
                                   .option(ChannelOption.SO_REUSEADDR, true)
                                   .childOption(ChannelOption.TCP_NODELAY, true)
                                   .childOption(ChannelOption.SO_KEEPALIVE, true);
    }

Please let me know if I should raise this directly on the reactor-netty project or if more details are needed.

Thanks!

Comment From: bclozel

I'm afraid we can't really improve the situation from a Spring Framework perspective.

While I don't expect a huge performance difference between the two, a lot of things could be happening here. The environment (1CPU for a JVM application is really constrained) and your configuration (why configure 150 workers here?) might have a big impact.

Also, depending on how the Gatling scenario is set (here specifically, the number of "users" and how many requests they send per session). It really matters here as if each user is creating a new connection for a single request, you're most likely benchmarking the TLS handshake and the network layer more than anything else.

Finally, there are also other aspects at play here: the source of random numbers used by your container, the amount of memory available vs. the memory used by TLS sessions and how they expire, etc.

You could also ensure that your application is using native openssl for maximum efficiency (is the library available and detected by Reactor Netty?).

I'd suggest joining the Reactor Netty Gitter channel to ask the community about their experience. As a result, I'm closing this issue. Thanks!