At the moment, I can add this to my application.yaml:

management.endpoint.health.group.readiness.include: foo

without any health indicator named foo on my classpath and:

  • the service starts up
  • no error is logged
  • no error occurs if I visit /actuator/health/readiness

This means that subtle issues like typos in the name of health indicators can cause a health indicator to fall out of a group. In the case of the K8s health groups, readiness and liveness, this can then impact on the reliability of your services. The problem is made more acute if show-components is not switched on as then there is no way to tell, as far as I can see, that the health indicator is missing from the group.

My feeling is that attempting to reference a health indicator that does not exist is always going to be a configuration issue and therefore this should be detected at startup and cause the startup to fail with an appropriate message. This will give clear feedback, fast, and allow the problem to be identified and fixed quickly.

If doing that would be a problem for backwards compatibility, my suggestion would be to instead only log an error and provide another configuration option that allows failure on startup, defaulting to false for now, but then defaulting to true in a future release.

Here is a minimal application demonstrating the issue: https://github.com/rupert-madden-abbott/spring-boot-missing-health-indicator

The readiness file configures both an existent (ping) and non-existent (foo) health indicator: https://github.com/rupert-madden-abbott/spring-boot-missing-health-indicator/blob/main/src/main/resources/application.yaml#L7

When visiting /actuator/health/readiness, you can see that ping is included and foo is not but no errors are exhibited at the endpoint or in the logs.

Comment From: wilkinsona

Flagging for discussion in a team meeting so that we can decide how to introduce the check. We need to make a decision about this:

If doing that would be a problem for backwards compatibility, my suggestion would be to instead only log an error and provide another configuration option that allows failure on startup, defaulting to false for now, but then defaulting to true in a future release.

The current behaviour almost feels like a usability bug. We could introduce the logging in a maintenance release and then switch to an error in 3.1 with an config option to only log the problem?

Comment From: philwebb

We discussed this today and we're going to do this in 3.1 only. We'll default to failing but have a property to restore the old behavior.