Hi 👋
I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases, but (novice) programmers who maybe look for some resources that might help them with their questions of what framework to choose for their next projects might look into these benchmarks and make their decision (partly) based on these numbers.
I've compared Quarkus and Spring(-Boot) here to compare two popular frameworks that have similar goals and deliver similar functionality. There is also a comparison over at Baeldung https://www.baeldung.com/spring-boot-vs-quarkus that compares these two.
Imho, this particular comparison isn't exactly great news for the Spring ecosystem. While there are some benchmarks on par with Quarkus, the majority is "won" by Quarkus - and by a greater margin one would expect. I've recently overheard a conversation why one should use Spring if Quarkus delivers everything as well, but with better performance. Now, we all know that there is more to this story and application A might be better suited for Quarkus while application B might be better suited for Spring - for whatever reasons. There is no silver bullet, after all. But this conversation kept me thinking. Obviously, I'm biased towards Spring and I love the ecosystem, but I'm wondering if we can do something about the performance gap (in these benchmarks).
I know for example that recent changes to the Quarkus benchmarks have been done by people involved with Quarkus. The Spring benchmarks are somewhat driven infrequently by community members (I think I have done something on these ages ago myself as well).
I'm also happy to help with this and at least update the Spring(-Boot) one to latest versions but I wonder if you have comparable machines at hand that you could run these benchmarks against and find out why these numbers are so comparably "bad". (I do have the suspicion that the numbers are actually much closer in reality). I'm also happy to help with the investigations, if you want me to 😉
Cheers, Christoph
P.S.: The point of this should be not to overtake Quarkus or establish a Spring vs Quarkus mentality, but to simply improve the Spring numbers. Quarkus is just a good comparison data point imho.
P.S.S: I'm reporting this here, because the majority of logic relevant to the benchmarks is inside Spring-Framework and not in Spring-Boot.
Comment From: dreis2211
Addendum 1:
There aren't exactly many benchmarks based on Tomcat (the default for SB apps), but comparing the JSON serialization with e.g. a vanilla undertow reveals that there seems to be a quite a lot of framework overhead. And it's not Jackson (because that is used in both undertow
and spring
(haven't checked Quarkus))
Comment From: sdeleuze
I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases
@dreis2211 Same, but as I am myself working on some benchmarks to compare Boot 3 JVM and native apps, we can maybe use this opportunity to compare: - How Spring Framework performs in a default Spring Boot 3 application versus Spring Boot 2 and see what is the evolution if any - Compare Undertow versus Tomcat on recent versions - Check if we can find low hanging fruits to optimize
Let's maybe start by focusing on the https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/spring. Do you have the bandwidth to create a branch with a Spring Boot 3.0.0-SNAPSHOT
(we are close to GA so should be pretty stable) version of it, upgrading to Java 17, Jakarta EE, etc. and compare the result you have locally with the Spring Boot 2.6.9
version used currently? And to compare Undertow versus Tomcat on both?
Comment From: dreis2211
I will have some time next week or earliest this weekend to do some preliminary tests, I guess.
Comment From: dreis2211
Fun fact, apparently the spring
benchmarks already run with Undertow and Round 21 doesn't include these changes yet.
Comment From: dreis2211
So, I had a bit of time tonight:
Local specs: Model: MacBookPro: Processor: 8-Core Intel Core i9 2,3 GHz Memory: 32 GB
NOTE: The results should be taken with a grain of salt. After all, local benchmarks are not really isolated and might be influenced by things running on my machine.
Spring-Boot 2.6.9 (Undertow):
Spring-Boot 2.6.9 (Tomcat):
Spring-Boot 3.0.0-SNAPSHOT (Undertow):
Spring-Boot 3.0.0-SNAPSHOT (Undertow) (with produces
on GetMapping
):
Spring-Boot 3.0.0-SNAPSHOT (Tomcat) (with produces
on GetMapping
):
Spring-Boot 2.3.1 (Tomcat) (Round 20):
Quarkus:
I'm sort of focussing on the Plaintext & JSON ones here as the others are somewhat volatile and due to the database involvement include some more flakyness not under any frameworks control. What I found was that there seems to be a substantial overhead if the application needs to determine the producible media types. E.g. the changes that moved from 2.3.1 & 2.6.9 include the produces
field on the GetMapping
annotations for plaintext. As one can see the different runs for 3.0.0 also differ largely for JSON. And there the only change is again the manual produces
on the endpoint. Maybe the following plays a role here: https://github.com/spring-projects/spring-framework/blob/b72ee5f0345efd281716ce95b63bfb492df1ddbd/spring-webmvc/src/main/java/org/springframework/web/servlet/mvc/method/annotation/AbstractMessageConverterMethodProcessor.java#L385-L395 .
Funny enough I came across this method already when playing around with the type pollution agent from @franz1981 / RedHat, which yields the following output:
--------------------------
Type Pollution Statistics:
--------------------------
1: org.springframework.http.converter.json.MappingJackson2HttpMessageConverter
Count: 58366
Types:
org.springframework.http.converter.HttpMessageConverter
org.springframework.http.converter.GenericHttpMessageConverter
Traces:
org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:385)
class: org.springframework.http.converter.HttpMessageConverter
count: 29208
org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:386)
class: org.springframework.http.converter.GenericHttpMessageConverter
count: 29115
org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducible
MediaTypes(AbstractMessageConverterMethodProcessor.java:387)
class: org.springframework.http.converter.GenericHttpMessageConverter
count: 43
If true, this would maybe indicate a scaling/performance problem caused by https://bugs.openjdk.org/browse/JDK-8180450 . Maybe it's not that, but just the fact, that it iterates over 8 message converters for each request to get their producible media types. I haven't really had the time tonight to look deeper into this.
In terms of Tomcat vs Undertow, there seems to be a huge difference for plain text responses (e.g. Tomcat only reaching half of what Undertow achieves).
Cheers, Christoph
Comment From: sdeleuze
Interesting, better to focus initially on JSON and plaintext indeed.
It would be interesting to do for example a new round of JSON test with only MappingJackson2HttpMessageConverter
and remove the other default converters.
I can also provide a branch that removes this instanceof
check, I will try to create that tomorrow and share it with you.
Comment From: dreis2211
I think AbstractMessageConverterMethodProcessor.writeWithMessageConverters
is indeed something to look at.
E.g. if I force the content-type to be preselected (a.k.a skipping even more logic than just the producible media types) via
@GetMapping(value = "/plaintext")
String plaintext(HttpServletResponse response) {
response.setContentType("text/plain");
return "Hello, World!";
}
@GetMapping(value = "/json")
Message json(HttpServletResponse response) {
response.setContentType("application/json");
return new Message("Hello, World!");
}
it yields the following numbers:
Spring-Boot 2.6.9 (Undertow):
Remember - the status quo of that was:
Comment From: dreis2211
Another finding: the current tests run inside a Debian Buster image. With Debian Bullseye I can reach a substantially better score on both JSON and Plaintext with 3.0.0 and the forced Content-Type:
Comment From: sdeleuze
@dreis2211 Pretty good results on skipping instanceof GenericHttpMessageConverter
checks, see #29438. I was able to confirm the gains on 2 different benchmarks, but if you can check on your side as well, please share the results in the PR.
Comment From: GeorgeSalu
is it worth testing with jetty ?
Comment From: dreis2211
@GeorgeSalu Generally, I wouldn't worry too much about Jetty or Undertow. Tomcat is usually the one that falls behind a bit.
But I've compared it just for you - with latest snapshots, bullseye and enforced content-type (throughout all benchmark modes and thus having a better weighted score overall):
Undertow
Jetty
From personal experience, the margin between them is not as huge as one might read from this results. I'm in fact a little bit surprised about the numbers generally. I have done some memory improvements that don't explain these results, though. I need to take a deeper look what changed in the snapshots in the meanwhile. What has changed is that my Macbook runs on Ventura now. Maybe there have been major improvements to the host OS that generally boost the numbers. Don't forget there is a bit of flakyness involved doing these on a local setup.
Anyhow: Back to your question. Undertow usually wins, also in internal loadtests that we do. There is also a dated benchmark comparison over at https://www.baeldung.com/spring-boot-servlet-containers that hasn't drastically changed in my opinion over recent years.
Cheers, Christoph
Comment From: dreis2211
I've run a new round of benchmarks to have comparable results between Spring-Boot 3.0.0 RC1 and RC2
RC1
RC2
There is an upgrade from Undertow 2.2.x to 2.3.x - I wouldn't be surprised if this brings some improvements. Next to several tiny improvements in Spring itself.
Comment From: sdeleuze
As you may have seen, more benchmarks on https://github.com/spring-projects/spring-framework/pull/29438 did not achieve to reproduce in a reliable way the gains I initially saw, I think it would maybe more visible with realistic workloads involving different converters. But this kind of instanceof
checks are used in various places in the Framework, and it would be better to fix the root issue rather than artificially write bad code to workaround it in specific places.
No promise yet, but https://bugs.openjdk.org/browse/JDK-8180450 may be backported to Java 17, so at least with this version we should be able to avoid this painful instanceof
issue on both Spring Boot 2.7.x and 3.x lines. We will push for that short and try to make that happen asap.
@dreis2211 Except this instanceof
issue, have you identify other low hanging fruits we could optimize to increase the throughput//latency?
Comment From: dreis2211
I've contributed some low hanging fruits already in https://github.com/spring-projects/spring-framework/pull/29428 & https://github.com/spring-projects/spring-framework/pull/29412
As noted earlier that the media-type/content-type dance is quite a big chunk - regardless of the instanceof
and secondary supers cache problematic - but in my opinion not really low-hanging unfortunately.
I think with the update to 3.x plus bullseye in the framework benchmarks, we should be able to achieve already quite some uplift. I think I will contribute this in the next days. And while being at it, probably also setting the content-type to avoid the costly computation.
Comment From: sdeleuze
Great, thanks a lot for your various contributions. Could be great if TechEmpower benchmark could be updated with Spring Boot 3.0.x, explicit content type and compare with the previous figures. Do you plan to do it?
@rstoyanchev I am wondering if you could have a look on if you see some reasonable way to skip some content type resolution part for more common use cases, if we should drop a note in our reference documentation to advise configuring explicitly the content type for better performances, etc.
Comment From: dreis2211
Yeah, I will likely do that somewhen this week. Fill do another comparison then as well on my local machine. But this week is busy, so might be something for the weekend.
Comment From: sdeleuze
No rush, thanks for your help on this, much appreciated.
Comment From: dreis2211
There you go: https://github.com/TechEmpower/FrameworkBenchmarks/pull/7749
2.6.9 (Before)
3.0.0 (After)
Compared to previous results, I updated my Mac, Docker etc.. The results were already better & worse before, but I think the head-to-head comparison from this evening is already a good step into the right direction
Comment From: rstoyanchev
We evaluate conditions in RequestMappingInfo#getMatchingCondition
in a specific order and return as soon as one returns false
. Patterns has always been at the bottom with String path matching, but with parsed PathPattern
in place, I wonder if we should move the PathPatternsRequestCondition
near the top, right after the HTTP method check. We could compare some more involved pattern vs consumes or produces.
That said, the ProducesRequestCondition
does cache the accepted media types in a request attribute, so at least that should be done only once per request withing RequestMappingHandlerMapping
. The same is not true for ConsumesRequestCondition
so there is an optimization to be made there.
None of this would impact the benchmark which now sets the response content-type directly, but there might be some new hotspots to be optimized in its present form, perhaps in AbstractMessageConverterMethodProcessor
that was discussed earlier.
Comment From: sdeleuze
Thanks for creating this follow-up issue @rstoyanchev.
@dreis2211 I close this issue as I think we have reached a reasonable state, feel free to open more focused one if you identify some interesting optimizations. I will on my side make sure that a fix for https://bugs.openjdk.org/browse/JDK-8180450 is available in Java 17 with our partner Bellsoft.
Comment From: dreis2211
Same here - we improved a few things along the road. Great stuff. Thanks 🙏
Comment From: cristianorvf
Maybe check the flux versions too, do they use the same infrastructure for media type?
Comment From: sdeleuze
Functional web APIs for both Spring MVC and WebFlux skip the infrastructure for media type by design, so that's maybe this that should be tried. I would expected annotation-based WebFlux controllers to use a similar kind of media type infrastructure.
Comment From: stillya
It will be great to add benchmark with virtual thread based executor, by the way.
Comment From: dreis2211
More an FYI. Round 22 results have been published. The screenshot below shows the composite scores ( with fullstack + java filter applied).
Quarkus ranks at #38
Spring ranks at #88
But more important than the rank the plaintext & json benchmark shows that Quarkus has still a throughput ~4 times "better" than Spring.
Nonetheless still some good news. The absolute results are better than last time apparently, despite the same machine. So the few things we've done here seem to have had at least a positive impact.
Comment From: 7fantasy7
From what I've seen in a plain text use case (not techempower benchmark, but can be the same),
it uses StreamUtils#copy(String in, Charset charset, OutputStream out)
through the StringHttpMessageConverter
Which does unnecessary buffer allocation (8kb) inside an OutputStreamWriter
consturctor.
Maybe it can be avoided, so instead of:
Writer writer = new OutputStreamWriter(out, charset);
writer.write(in);
writer.flush();
write to the stream directly.
out.write(in.getBytes(charset));
out.flush();
Here's approximate difference in time
Benchmark (stringLength) Mode Cnt Score Error Units
StreamUtilsBenchmark.copy 10 avgt 10 4.769 ± 0.110 ns/op
StreamUtilsBenchmark.copy 100 avgt 10 10.368 ± 0.569 ns/op
StreamUtilsBenchmark.copy 1000 avgt 10 42.393 ± 0.905 ns/op
StreamUtilsBenchmark.copyOld 10 avgt 10 637.235 ± 51.005 ns/op
StreamUtilsBenchmark.copyOld 100 avgt 10 602.219 ± 188.381 ns/op
StreamUtilsBenchmark.copyOld 1000 avgt 10 898.469 ± 160.039 ns/op
Comment From: sdeleuze
I plan to give another look to the benchmark.