SpringBoot Support caching for Actuator endpoint read operations with parameters

Hi. I'm using Spring Boot 2.6.6 and trying to use time-to-live property for health groups.

The management.endpoint.health.cache.time-to-live=120s property in application.properties works fine to cache health check results when I call /actuator/health, but if I call /actuator/health/deep (where deep is group), I get no cache results. The management.endpoint.health.group.deep.cache.time-to-live=120s property also doesn't work.

I found a task with this problem on Stack Overflow, but there was no answer.

Comment From: snicoll

@shufflik this feature does not exist. If you are using an IDE with Spring Boot support, it should tell you that management.endpoint.health.group.deep.cache.time-to-live is not a valid property. There's no such reference in the documentation I can see either.

Flagging for team attention to check if that's a feature we'd be considering.

Comment From: mhalbritter

I'd be wary to introduce caching on the health endpoints. Usually they are used in Kubernetes to put unhealthy pods out of the load balancer rotation, and that should be done as fast as possible to not route customers to broken pods. Adding a cache with some expiry time adds more latency to this and is another knob to consider after the health probe setting in Kubernetes YAML.

Comment From: shufflik

@snicoll the ability to use the management.endpoint.health.group.deep.show-details: always property for custom groups is also not mentioned in the documentation, but this property works correctly for custom groups, so I assumed that the management.endpoint.health.group.deep.cache.time-to-live property should also work. If this feature is currently missing, are there any plans to add it?

Comment From: snicoll

the ability to use the management.endpoint.health.group.deep.show-details: always property for custom groups is also not mentioned in the documentation

It is mentioned in the documentation.

this property works correctly for custom groups, so I assumed that the management.endpoint.health.group.deep.cache.time-to-live property should also work.

That's an odd reasoning IMO.

If this feature is currently missing, are there any plans to add it?

I think you can answer that question by yourself. I already told you this feature does not exist and I've flagged the issue for team attention to see if "that's a feature we'd be considering".

Comment From: wilkinsona

The current caching implementation does not cache responses when there are parameters involved:

Endpoints automatically cache responses to read operations that do not take any parameters

This is why /actuator/health/deep isn't cached. deep is a parameter that's passed into the health endpoint.

To implement this consistently, we'd have to remove this caching restriction across all endpoints. To do it only for the health endpoint would, in my opinion, be confusing and counter-productive.

If the restriction was removed across all endpoints, I think there's a risk that a poorly implemented or malicious client could cause excessive memory usage. For example, making a request to /actuator/env and then subsequent requests to /actuator/env/{property} for every property in the environment would result in a significant number of entries in the cache. This could be mitigated by limiting the number of entries in each cache but it adds complexity.

It's not clear to me at this point that there's sufficient demand to justify this increased complexity. I'm not entirely opposed to implementing this should the demand for it increase. As such, I don't think the issue should be closed but it probably belongs in the general backlog.

Comment From: philwebb

Health is an interesting endpoint because it's generally public and we've been use the groups feature for Kubernetes. I'd like us to be consistent if possible, but I also think it would be helpful to have caching for health groups sooner rather than later. One thing that cache provides is a certain amount of protection against DoS attacks.

Comment From: wilkinsona

That's a good point about health generally being public and the DoS prevention that caching could give us. If we're considering a health-specific solution, https://github.com/spring-projects/spring-boot/issues/25459 is another option.

I wanted to try and quantify the increase in complexity from considering an operation's arguments in the cache key. It's not as bad as I had anticipated. I haven't limited the size of the cache but, from some manual testing, it appears that it wouldn't take much more than something like these changes to get things working with parameters. Apart from limiting the size of the cache, the hardest part is probably evolving the APIs so that the caching invoker can resolve the arguments to create the cache key.

Comment From: 14ZOli

I just felt the same pain too.

I created a group as a parameter of the "health" endpoint (in order to select the components I wish to consider for my readiness probe) and now I can't configure its caching mechanism in order to avoid DOS (intentional or unintentional ones).

This would really come in handy!