Spring has the nice actuator feature of showing the current health status of the application (/health
).
It would be nice if that same value would also be available as a metric in the /metric
endpoint. That way I could use the same protocol (such as prometheus) to monitor the health without extra configuration.
Example:
@Component
public class HealthMetricsConfigurer {
@Autowired
private HealthIndicator globalHealthIndicator; // Maybe do it per instance? (from HealthIndicatorRegistry)
@Autowired
private MeterRegistry meterRegistry;
@PostConstruct
public void init() {
trackStatus(Status.UP.getCode());
trackStatus(Status.DOWN.getCode());
trackStatus(Status.UNKNOWN.getCode());
trackStatus(Status.OUT_OF_SERVICE.getCode());
// TODO: Track other status codes
}
private void trackStatus(String code) {
Gauge.builder("up", globalHealthIndicator, healthTrackerForStatus(code))
.tag("status", code))
.tag("indicator", "global")
.description("The number of instances that have the health "+code)
.baseUnit("instance")
.register(meterRegistry);
}
private ToDoubleFunction<HealthIndicator> healthTrackerForStatus(String code) {
return health -> code.equals(health.health().getStatus().getCode()) ? 1d : 0d;
}
}
Feel free to use or modify this code snippet as you see fit.
Comment From: philwebb
See also #12348 which talks about exposing info under metrics
Comment From: philwebb
After discussing this a little we don't think it's a great idea for us to surface health information under metrics as they are really designed for different purposes.
Comment From: dimovelev
Prometheus seems to be the norm for monitoring in OpenShift/Kubernetes deployments. Prometheus has out-of-the-box alerting based on metrics that is has scraped (https://www.prometheus.io/docs/prometheus/latest/configuration/alerting_rules/).
It would have been nice if a spring-boot application automatically exposed the health state as metrics. For those metrics to be up-to-date, spring-boot would also require a built-in capability to perform the health checks on a regular basis (how regular would be configurable) so that those metrics are updated without any additional system calling the actuator health endpoint.
An additional benefit in those situations would be that the user would also get history of the health state of their application without any additional effort. Having a background task that performs the health checks on a regular basis would also mean that we could publish application context events about the availability of health checks - a concrete use case for that I have implemented is to stop/pause the kafka consumption if the database (where the kafka messages would ultimately be persisted) becomes unavailable and resume when it is available again. Doing so allows other consumers to take over the partitions / reduces logging related to us not being able to process the messages.
An alternative that I have seen in some projects (in a bit different form then I describe here), is exposing another actuator endpoint that only contains prometheus metrics for the health indicators. These metrics call the health indicator's health() method when sampled. As a result, the health indicators are refreshed automatically when prometheus samples this endpoint and the user can configure two scrape configurations with different frequency - one for the regular metrics which can be very frequent and another one for the health which is less frequent.
Comment From: philwebb
I've reopened this to get some feedback from the Micrometer team. We probably won't be able to get to that until the new year.
Comment From: ckoutsouridis
this would be a great addition i believe if was supported natively. Considering the typical setup of kubernetes + prometheus + usage of readiness/liveness probes, there is no one left to call actuator/health
...
the only workaround is to setup a blackbox exporter to monitor the above path, which in my opinion is much more uncomfortable to do than exposing the metrics directly from within the app ( in the later case we can also have granularity over the failed health indicators, compared with a simple UP/DOWN
Comment From: shakuzen
Sorry for the long delay on giving feedback from the Micrometer team perspective.
I made a #health-metrics
channel in the Micrometer Slack (https://slack.micrometer.io) to discuss this topic. I think we need a better understanding of the problem we are trying to solve before any concrete proposals for changes in Spring Boot.
There are different implementations that have been suggested throughout the history of this topic in both the Micrometer issues and here in Spring Boot, and which implementation is most useful (if at all) varies among users. Let's make sure we're solving the right problem for a general audience first. If a proposed implementation works well for your situation in the meantime, you can add the configuration to your app (or a small auto-configuration library to share among apps) easily enough. See https://github.com/micrometer-metrics/micrometer/issues/416 or the Spring Boot docs for example implementations.
I think this issue can be closed for now at least until we reach some consensus in the discussion in Micrometer.
Comment From: snicoll
Thanks Tommy.