Hello, all!
I am running a lot of fairly complex Spring Boot applications in a high-traffic environment orchestrated by Kubernetes.
I am having issues with automatic scaling due to the latency and extra resource consumption during the early stages of the application startup.
It seems that this is a common problem, as there is a lot of discussion scattered around the internet going back many years. I have tried various strategies with varying degrees of success.
I was really hoping that the Spring Boot maintainers could provide some clarification on what the current best practice is.
I'm currently exploring using ApplicationReadyEvent
to prewarm the application with artificial load before setting the readiness actuator as ready to start serving traffic. However, because my application requires external authentication and knowledge of the data in the database to successfully call most of the endpoints, it is difficult to warm most of the application simply by calling its own endpoints. I was thinking about trying to run some of the mocked-up unit tests after the event has been received, but this seems "hacky" and I'm not even sure it will solve the problem.
After scouring the internet and discovering lots of recommendations which may no longer be applicable to the post-2.3.0 Spring Boot, I'm left thinking the best option is to simply ask the Elder Wizards for their advice: In 2021, what's the best way to ensure a warm VM before allowing traffic to be sent to it?
Comment From: wilkinsona
Thanks for raising this, but I don't think we're particularly well-placed to document this. It's one of those questions where the answer is "it depends" and in this case it depends on lots of different variables.
https://start.spring.io runs on Kubernetes and we do not warm up the JVM when a new version is deployed as we've never identified a need to do so. Among other things, that could be due to the nature of the traffic it receives, the nature of the application, the nature of Kubernetes environment to which it's deployed, etc.
You may be interested in this blog post where someone had a real-world problem with slow response times after a deploy. They describe how they tackled warming up the JVM. It may or may not be applicable to your application as every application will be different in this regard. Interestingly, for their application, warming up the JVM didn't solve the problem. Instead, they turned to Kubernetes' burstable QoS with much better results.
FWIW, using the burstable QoS feels like a better solution to me. It isn't application specific and doesn't rely upon synthesising load which may be difficult to maintain as your application evolves. Faced with your problem, it's what I would try first.