Spring Investigate suboptimal score on TechEmpower benchmarks

Hi 👋

I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases, but (novice) programmers who maybe look for some resources that might help them with their questions of what framework to choose for their next projects might look into these benchmarks and make their decision (partly) based on these numbers.

I've compared Quarkus and Spring(-Boot) here to compare two popular frameworks that have similar goals and deliver similar functionality. There is also a comparison over at Baeldung https://www.baeldung.com/spring-boot-vs-quarkus that compares these two.

Imho, this particular comparison isn't exactly great news for the Spring ecosystem. While there are some benchmarks on par with Quarkus, the majority is "won" by Quarkus - and by a greater margin one would expect. I've recently overheard a conversation why one should use Spring if Quarkus delivers everything as well, but with better performance. Now, we all know that there is more to this story and application A might be better suited for Quarkus while application B might be better suited for Spring - for whatever reasons. There is no silver bullet, after all. But this conversation kept me thinking. Obviously, I'm biased towards Spring and I love the ecosystem, but I'm wondering if we can do something about the performance gap (in these benchmarks).

I know for example that recent changes to the Quarkus benchmarks have been done by people involved with Quarkus. The Spring benchmarks are somewhat driven infrequently by community members (I think I have done something on these ages ago myself as well).

I'm also happy to help with this and at least update the Spring(-Boot) one to latest versions but I wonder if you have comparable machines at hand that you could run these benchmarks against and find out why these numbers are so comparably "bad". (I do have the suspicion that the numbers are actually much closer in reality). I'm also happy to help with the investigations, if you want me to 😉

Cheers, Christoph

P.S.: The point of this should be not to overtake Quarkus or establish a Spring vs Quarkus mentality, but to simply improve the Spring numbers. Quarkus is just a good comparison data point imho.

P.S.S: I'm reporting this here, because the majority of logic relevant to the benchmarks is inside Spring-Framework and not in Spring-Boot.

Comment From: dreis2211

Addendum 1:

There aren't exactly many benchmarks based on Tomcat (the default for SB apps), but comparing the JSON serialization with e.g. a vanilla undertow reveals that there seems to be a quite a lot of framework overhead. And it's not Jackson (because that is used in both undertow and spring (haven't checked Quarkus))

Spring Investigate suboptimal score on TechEmpower benchmarks

Comment From: sdeleuze

I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases

@dreis2211 Same, but as I am myself working on some benchmarks to compare Boot 3 JVM and native apps, we can maybe use this opportunity to compare: - How Spring Framework performs in a default Spring Boot 3 application versus Spring Boot 2 and see what is the evolution if any - Compare Undertow versus Tomcat on recent versions - Check if we can find low hanging fruits to optimize

Let's maybe start by focusing on the https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/spring. Do you have the bandwidth to create a branch with a Spring Boot 3.0.0-SNAPSHOT (we are close to GA so should be pretty stable) version of it, upgrading to Java 17, Jakarta EE, etc. and compare the result you have locally with the Spring Boot 2.6.9 version used currently? And to compare Undertow versus Tomcat on both?

Comment From: dreis2211

I will have some time next week or earliest this weekend to do some preliminary tests, I guess.

Comment From: dreis2211

Fun fact, apparently the spring benchmarks already run with Undertow and Round 21 doesn't include these changes yet.

Comment From: dreis2211

So, I had a bit of time tonight:

Local specs: Model: MacBookPro: Processor: 8-Core Intel Core i9 2,3 GHz Memory: 32 GB

NOTE: The results should be taken with a grain of salt. After all, local benchmarks are not really isolated and might be influenced by things running on my machine.

Spring-Boot 2.6.9 (Undertow):

Spring Investigate suboptimal score on TechEmpower benchmarks

Spring-Boot 2.6.9 (Tomcat): Spring Investigate suboptimal score on TechEmpower benchmarks

Spring-Boot 3.0.0-SNAPSHOT (Undertow):

Spring Investigate suboptimal score on TechEmpower benchmarks

Spring-Boot 3.0.0-SNAPSHOT (Undertow) (with produces on GetMapping): Spring Investigate suboptimal score on TechEmpower benchmarks

Spring-Boot 3.0.0-SNAPSHOT (Tomcat) (with produces on GetMapping): Spring Investigate suboptimal score on TechEmpower benchmarks

Spring-Boot 2.3.1 (Tomcat) (Round 20): Spring Investigate suboptimal score on TechEmpower benchmarks

Quarkus: Spring Investigate suboptimal score on TechEmpower benchmarks

I'm sort of focussing on the Plaintext & JSON ones here as the others are somewhat volatile and due to the database involvement include some more flakyness not under any frameworks control. What I found was that there seems to be a substantial overhead if the application needs to determine the producible media types. E.g. the changes that moved from 2.3.1 & 2.6.9 include the produces field on the GetMapping annotations for plaintext. As one can see the different runs for 3.0.0 also differ largely for JSON. And there the only change is again the manual produces on the endpoint. Maybe the following plays a role here: https://github.com/spring-projects/spring-framework/blob/b72ee5f0345efd281716ce95b63bfb492df1ddbd/spring-webmvc/src/main/java/org/springframework/web/servlet/mvc/method/annotation/AbstractMessageConverterMethodProcessor.java#L385-L395 .

Funny enough I came across this method already when playing around with the type pollution agent from @franz1981 / RedHat, which yields the following output:

--------------------------
Type Pollution Statistics:
--------------------------
1:      org.springframework.http.converter.json.MappingJackson2HttpMessageConverter
Count:  58366
Types:
        org.springframework.http.converter.HttpMessageConverter
        org.springframework.http.converter.GenericHttpMessageConverter
Traces:
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:385)
                class: org.springframework.http.converter.HttpMessageConverter
                count: 29208
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:386)
                class: org.springframework.http.converter.GenericHttpMessageConverter
                count: 29115
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducible
MediaTypes(AbstractMessageConverterMethodProcessor.java:387)
                class: org.springframework.http.converter.GenericHttpMessageConverter
                count: 43

If true, this would maybe indicate a scaling/performance problem caused by https://bugs.openjdk.org/browse/JDK-8180450 . Maybe it's not that, but just the fact, that it iterates over 8 message converters for each request to get their producible media types. I haven't really had the time tonight to look deeper into this.

In terms of Tomcat vs Undertow, there seems to be a huge difference for plain text responses (e.g. Tomcat only reaching half of what Undertow achieves).

Cheers, Christoph

Comment From: sdeleuze

Interesting, better to focus initially on JSON and plaintext indeed.

It would be interesting to do for example a new round of JSON test with only MappingJackson2HttpMessageConverter and remove the other default converters.

I can also provide a branch that removes this instanceof check, I will try to create that tomorrow and share it with you.

Comment From: dreis2211

I think AbstractMessageConverterMethodProcessor.writeWithMessageConverters is indeed something to look at.

E.g. if I force the content-type to be preselected (a.k.a skipping even more logic than just the producible media types) via

    @GetMapping(value = "/plaintext")
    String plaintext(HttpServletResponse response) {
        response.setContentType("text/plain");
        return "Hello, World!";
    }

    @GetMapping(value = "/json")
    Message json(HttpServletResponse response) {
        response.setContentType("application/json");
        return new Message("Hello, World!");
    }

it yields the following numbers:

Spring-Boot 2.6.9 (Undertow):

Spring Investigate suboptimal score on TechEmpower benchmarks

Remember - the status quo of that was:

Spring Investigate suboptimal score on TechEmpower benchmarks

Comment From: dreis2211

Another finding: the current tests run inside a Debian Buster image. With Debian Bullseye I can reach a substantially better score on both JSON and Plaintext with 3.0.0 and the forced Content-Type: Spring Investigate suboptimal score on TechEmpower benchmarks

Comment From: sdeleuze

@dreis2211 Pretty good results on skipping instanceof GenericHttpMessageConverter checks, see #29438. I was able to confirm the gains on 2 different benchmarks, but if you can check on your side as well, please share the results in the PR.

Comment From: GeorgeSalu

is it worth testing with jetty ?

Comment From: dreis2211

@GeorgeSalu Generally, I wouldn't worry too much about Jetty or Undertow. Tomcat is usually the one that falls behind a bit.

But I've compared it just for you - with latest snapshots, bullseye and enforced content-type (throughout all benchmark modes and thus having a better weighted score overall):

Undertow Spring Investigate suboptimal score on TechEmpower benchmarks

Jetty Spring Investigate suboptimal score on TechEmpower benchmarks

From personal experience, the margin between them is not as huge as one might read from this results. I'm in fact a little bit surprised about the numbers generally. I have done some memory improvements that don't explain these results, though. I need to take a deeper look what changed in the snapshots in the meanwhile. What has changed is that my Macbook runs on Ventura now. Maybe there have been major improvements to the host OS that generally boost the numbers. Don't forget there is a bit of flakyness involved doing these on a local setup.

Anyhow: Back to your question. Undertow usually wins, also in internal loadtests that we do. There is also a dated benchmark comparison over at https://www.baeldung.com/spring-boot-servlet-containers that hasn't drastically changed in my opinion over recent years.

Cheers, Christoph

Comment From: dreis2211

I've run a new round of benchmarks to have comparable results between Spring-Boot 3.0.0 RC1 and RC2

RC1 Spring Investigate suboptimal score on TechEmpower benchmarks

RC2 Spring Investigate suboptimal score on TechEmpower benchmarks

There is an upgrade from Undertow 2.2.x to 2.3.x - I wouldn't be surprised if this brings some improvements. Next to several tiny improvements in Spring itself.

Comment From: sdeleuze

As you may have seen, more benchmarks on https://github.com/spring-projects/spring-framework/pull/29438 did not achieve to reproduce in a reliable way the gains I initially saw, I think it would maybe more visible with realistic workloads involving different converters. But this kind of instanceof checks are used in various places in the Framework, and it would be better to fix the root issue rather than artificially write bad code to workaround it in specific places.

No promise yet, but https://bugs.openjdk.org/browse/JDK-8180450 may be backported to Java 17, so at least with this version we should be able to avoid this painful instanceof issue on both Spring Boot 2.7.x and 3.x lines. We will push for that short and try to make that happen asap.

@dreis2211 Except this instanceof issue, have you identify other low hanging fruits we could optimize to increase the throughput//latency?

Comment From: dreis2211

I've contributed some low hanging fruits already in https://github.com/spring-projects/spring-framework/pull/29428 & https://github.com/spring-projects/spring-framework/pull/29412

As noted earlier that the media-type/content-type dance is quite a big chunk - regardless of the instanceof and secondary supers cache problematic - but in my opinion not really low-hanging unfortunately. Spring Investigate suboptimal score on TechEmpower benchmarks

I think with the update to 3.x plus bullseye in the framework benchmarks, we should be able to achieve already quite some uplift. I think I will contribute this in the next days. And while being at it, probably also setting the content-type to avoid the costly computation.

Comment From: sdeleuze

Great, thanks a lot for your various contributions. Could be great if TechEmpower benchmark could be updated with Spring Boot 3.0.x, explicit content type and compare with the previous figures. Do you plan to do it?

@rstoyanchev I am wondering if you could have a look on if you see some reasonable way to skip some content type resolution part for more common use cases, if we should drop a note in our reference documentation to advise configuring explicitly the content type for better performances, etc.

Comment From: dreis2211

Yeah, I will likely do that somewhen this week. Fill do another comparison then as well on my local machine. But this week is busy, so might be something for the weekend.

Comment From: sdeleuze

No rush, thanks for your help on this, much appreciated.

Comment From: dreis2211

There you go: https://github.com/TechEmpower/FrameworkBenchmarks/pull/7749

2.6.9 (Before) Spring Investigate suboptimal score on TechEmpower benchmarks

3.0.0 (After) Spring Investigate suboptimal score on TechEmpower benchmarks

Compared to previous results, I updated my Mac, Docker etc.. The results were already better & worse before, but I think the head-to-head comparison from this evening is already a good step into the right direction

Comment From: rstoyanchev

We evaluate conditions in RequestMappingInfo#getMatchingCondition in a specific order and return as soon as one returns false. Patterns has always been at the bottom with String path matching, but with parsed PathPattern in place, I wonder if we should move the PathPatternsRequestCondition near the top, right after the HTTP method check. We could compare some more involved pattern vs consumes or produces.

That said, the ProducesRequestCondition does cache the accepted media types in a request attribute, so at least that should be done only once per request withing RequestMappingHandlerMapping. The same is not true for ConsumesRequestCondition so there is an optimization to be made there.

None of this would impact the benchmark which now sets the response content-type directly, but there might be some new hotspots to be optimized in its present form, perhaps in AbstractMessageConverterMethodProcessor that was discussed earlier.

Comment From: sdeleuze

Thanks for creating this follow-up issue @rstoyanchev.

@dreis2211 I close this issue as I think we have reached a reasonable state, feel free to open more focused one if you identify some interesting optimizations. I will on my side make sure that a fix for https://bugs.openjdk.org/browse/JDK-8180450 is available in Java 17 with our partner Bellsoft.

Comment From: dreis2211

Same here - we improved a few things along the road. Great stuff. Thanks 🙏

Comment From: cristianorvf

Maybe check the flux versions too, do they use the same infrastructure for media type?

Comment From: sdeleuze

Functional web APIs for both Spring MVC and WebFlux skip the infrastructure for media type by design, so that's maybe this that should be tried. I would expected annotation-based WebFlux controllers to use a similar kind of media type infrastructure.

Comment From: stillya

It will be great to add benchmark with virtual thread based executor, by the way.

Comment From: dreis2211

More an FYI. Round 22 results have been published. The screenshot below shows the composite scores ( with fullstack + java filter applied).

Quarkus ranks at #38 Spring ranks at #88

But more important than the rank the plaintext & json benchmark shows that Quarkus has still a throughput ~4 times "better" than Spring.

Nonetheless still some good news. The absolute results are better than last time apparently, despite the same machine. So the few things we've done here seem to have had at least a positive impact.

Spring Investigate suboptimal score on TechEmpower benchmarks

Comment From: 7fantasy7

From what I've seen in a plain text use case (not techempower benchmark, but can be the same), it uses StreamUtils#copy(String in, Charset charset, OutputStream out) through the StringHttpMessageConverter

Which does unnecessary buffer allocation (8kb) inside an OutputStreamWriter consturctor.

Maybe it can be avoided, so instead of:

Writer writer = new OutputStreamWriter(out, charset);
writer.write(in);
writer.flush();

write to the stream directly.

out.write(in.getBytes(charset));
out.flush();

Here's approximate difference in time

Benchmark                        (stringLength)  Mode  Cnt    Score     Error  Units
StreamUtilsBenchmark.copy                    10  avgt   10    4.769 ±   0.110  ns/op
StreamUtilsBenchmark.copy                   100  avgt   10   10.368 ±   0.569  ns/op
StreamUtilsBenchmark.copy                  1000  avgt   10   42.393 ±   0.905  ns/op
StreamUtilsBenchmark.copyOld                 10  avgt   10  637.235 ±  51.005  ns/op
StreamUtilsBenchmark.copyOld                100  avgt   10  602.219 ± 188.381  ns/op
StreamUtilsBenchmark.copyOld               1000  avgt   10  898.469 ± 160.039  ns/op

Comment From: sdeleuze

I plan to give another look to the benchmark.