Hello!
We upgraded from Spring Boot 3.2.3 to 3.3.1 (later switched to 3.3.2) and noticed my quantile tags has disappeared of multiple metrics. Our configuration was the following:
management:
endpoints:
web.exposure.include: "health,info,metrics,configprops,prometheus"
metrics:
distribution:
percentiles-histogram:
http.client.requests: true
http.server.requests: false
percentiles:
http.client.requests: 0.50,0.95,0.99,0.999
http.server.requests: 0.50,0.95,0.99,0.999
Weeks after the upgrade, we saw our HTTP Client 99, 95 and 50 percentiles Grafana Dash boards died. However, our histogram heatmap was still functional. After scratching my hair we decided to upgrade to 3.3.2 due to a micrometer warning log appeared and it seemed connected to a bug. However, this did not resolve the issue.
After some checking and testing, we noticed that the HTTP Server metrics was still functional, so after playing around, we decided to configure the following:
management:
endpoints:
web.exposure.include: "health,info,metrics,configprops,prometheus"
metrics:
distribution:
percentiles-histogram:
http.server.requests: false
percentiles:
http.client.requests: 0.50,0.95,0.99,0.999
http.server.requests: 0.50,0.95,0.99,0.999
And viola! Metrics with the quantile tag pops up! We also have bucket metrics, even though http.client.requests: true is not configured and set to true.
Am I missing something or is this a bug?
Thank you!
Comment From: jonatan-ivanov
I'm assuming you are using Prometheus.
I think this is because upgrading Boot 3.2 -> 3.3 also upgrades Micrometer 1.12 -> 1.13. In Micrometer 1.13 we upgraded Prometheus 0.x to 1.x, see the Upgrade section of the Migration Guide. Having quantiles and histogram in the same metric family was possible with the 0.x client but it is not possible with the new client, it is also invalid according to the Prometheus and OpenMetrics specs. Because of this, Micrometer favors histogram over quantiles and it will ignore the quantiles if a histogram is also requested. See the Histogram vs. Summary section of the Migration Guide.
I'm not sure if you are aware or have a different use case but quantiles can be calculated on the Prometheus Server side (histogram_quantile()) so using both does not make too much sense. In addition to this, Micrometer percentiles are approximated on the application side (can be less accurate) and not aggregatable while histograms can be more accurate and aggregatable.
So to summarize your options:
1. Use histograms (no percentiles) and calculate quantiles on the Prometheus Server side (recommended)
1. The Prometheus Java client 0.x is still available in micrometer-registry-prometheus-simpleclient (but deprecated), see Migration Guide (not recommended but temporarily might be ok if you need more time to migrate)
1. Downgrade Micrometer to 1.12 (not recommended)
Please see the Migration Guide for more details and this comment https://github.com/micrometer-metrics/micrometer/issues/5150#issuecomment-2274913226 where I explain what is happening on the Prometheus output level.
Comment From: jonatan-ivanov
Fyi, here is an example Grafana dashboard / Prometheus query that calculates quantiles from the histogram of http.server.requests:
histogram_quantile(0.99, sum(rate(http_server_requests_seconds_bucket{...}[$__rate_interval])) by (le))
Comment From: wilkinsona
Thanks, @jonatan-ivanov. I've just checked the release notes for Boot 3.3 and we already have a section on Prometheus 1.x with a link to the Micrometer migration guide. Given that, I'm going to close this one. We can re-open if it transpires that Prometheus isn't the cause and that there's more to do.