SpringBoot Health and Liveness Actuator Connections Refused/Fail Under Heavy Load

Spring Boot 2.4.1

I've set up new Readiness and Liveness probes, which work nicely under normal conditions.

However, under heavy load, they return Liveness probe failed: Get http://<ip>:<port>/health/liveness: dial tcp <ip>:<port>: connect: connection refused to Kubernetes. Obviously, this is bad as it will then restart strained-but-generally-working containers.

Would really like a way to configure priority to management endpoints above normal traffic to make sure life/readiness checks don't fail above all.

Comment From: wilkinsona

Are you running Actuator on the same port as your application or have you configured management.server.port to make the Actuator available on a separate port?

Comment From: richvim

It's on a separate port.

Comment From: wilkinsona

That’s odd. Running on a separate port, they should be unaffected by load on the main application (which can actually be a bad thing as they can falsely indicate that the app is alive and ready). Can you describe the “heavy load” and your management server in some more detail please?

Comment From: richvim

I am using load testing tools towards a dev cluster, a few hundred requests per second.

The error sometimes manifests as net/http: request canceled (Client.Timeout exceeded while awaiting headers), or HTTP probe failed with statuscode: 503, but mostly it's connect: connection refused. There is a delay period before K starts sending the probes and multiple retries, but it still happens often enough that quite a few pods restart. It occurs on multiple different applications.

Comment From: wilkinsona

Are you load testing the application's endpoints or the actuator?

Comment From: richvim

Web traffic is going to the application endpoints, the actuator endpoints are being called periodically Kubernetes, which is using the HPA.

Comment From: wilkinsona

Thanks. From what you've described thus far, I am skeptical that this is a Spring Boot problem. With the management server running on a separate port, its ability to accept connections will be unaffected by the load on the application endpoints as it uses a separate embedded server with a separate thread pool, etc. It may be possible for it to become unresponsive if the CPU is saturated to the extent that a thread in the management server doesn't get scheduled to accept the connection. However, in that case, I think the behaviour you're seeing is desirable as it would be just as likely to happen with a request to an application endpoint so the instance really isn't alive and ready to handle requests.

Would really like a way to configure priority to management endpoints above normal traffic to make sure life/readiness checks don't fail above all

I'm not sure that it would be possible for us to implement this, but even if it were, I think it would be a mistake to do so. If traffic to the management endpoints was prioritised you may get into a situation where an instance reports that it is alive and ready when it is, in fact, unable to handle any application traffic. It's for this reason that we caution against configuring a separate management port as doing so can cause the same problem.

Comment From: richvim

Yeah, I see your point about why that would be bad.

I have somewhat mitigated the problem by specifying a larger timeoutSeconds for the endpoints in the pod specification. They still occur, but less frequently, and so are less fatal.

Comment From: bclozel

Thanks for getting back to us @richvim

You've mentioned the Horizontal Pod Autoscaler and I was wondering about a few things:

How do the pod resource metrics look like when this problem is happening? Is the HPA failing to detect that the pods are too busy?
Is it possible that your load testing tool is not using persistent connections and opening/closing a lot of connections, thus saturating your host with TIME_WAIT TCP connections and preventing new ones?
Is it possible that your load testing scenario is ramping up faster than the HPA checks? By the time new application instances are started, we might be already at capacity.

Comment From: richvim

I think this ticket can be closed, but for others who may stumble upon this, I solved my autoscaling woes with a combination of things: Using the G1GC garbage collector, using Undertow rather than Tomcat, setting slightly higher limits than requests in the pod spec, using app-specific initialDelaySeconds, periodSeconds, failureThreshold and timeoutSeconds for liveness and readiness endpoints (running on a separate and unexposed admin port), replacing the autoconfiguration with specific @Imports, warming the application's internal cache with ApplicationReadyEvent, and shrinking the Docker image size. With those changes, it appears the application can scale nearly infinitely at very high speeds and can handle all of the traffic we can throw at it without any service degradation or interruption.