Hi all. When I switched to spring boot from version 2.2.13.RELEASE to version 3.1.2, there was a problem with the metrics.
application.yml:
management:
endpoint:
health:
show-details: "ALWAYS"
endpoint:
metrics:
enabled: true
prometheus:
enabled: true
endpoints:
web:
exposure:
include: "*"
metrics:
export:
prometheus:
enabled: true
Namely, in some metrics, the UNKNOWN status appeared. Although in fact all requests were successful and should have a status of 200:
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 9993.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 80.860505828
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 10.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.170536004
This can be easily reproduced by sending 1000 parallel requests to the /actuator/prometheus endpoint:
public static void main(String[] args) {
int numThreads = 10;
int numRequestsPerThread = 1000;
ExecutorService executorService = Executors.newFixedThreadPool(numThreads);
HttpClient httpClient = HttpClient.newHttpClient();
for (int i = 0; i < numThreads; i++) {
executorService.submit(() -> {
for (int j = 0; j < numRequestsPerThread; j++) {
sendHttpRequest(httpClient, "http://localhost:8080/actuator/prometheus");
}
});
}
executorService.shutdown();
}
private static void sendHttpRequest(HttpClient httpClient, String url) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.GET()
.build();
try {
HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
if (response.statusCode() != 200) {
System.out.println("Response Code: " + response.statusCode());
}
} catch (Exception e) {
e.printStackTrace();
}
}
What could be causing this problem and how to fix it? Thank you!
Comment From: mhalbritter
Hello!
I can't reproduce this:
> curlie :8080/actuator/prometheus | grep -i http_server_requests_seconds
HTTP/1.1 200
Content-Type: text/plain;version=0.0.4;charset=utf-8
Content-Length: 12198
Date: Thu, 10 Aug 2023 12:01:02 GMT
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 20006.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 18.324825225
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.0143335
Maybe it has something to do with dropped requests? Do you have a sample application where this happens all the time?
Comment From: dimon8829
Thank you for response! Yes, I runnig spring boot app:
Reproducer: untitled.zip
And then I get this result:
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 996.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 143.624052605
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 5.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.855936009
Nobody dropped this requests. Thank you!
Comment From: snicoll
Please move all of this in a project that we can run ourselves. We'd have that to confirm your report anyway and there's a chance to miss a step in doing so.
Comment From: dimon8829
Comment From: mhalbritter
Those are the "successful" requests:
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 996.0
Those are the "unsuccessful" ones:
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 5.0
So something is different with these requests. Let's see if we can find this when we get the sample.
Comment From: mhalbritter
Nope, sorry, I can't reproduce this:
curlie :8080/actuator/prometheus | grep -i http_server_requests_seconds
HTTP/1.1 200 OK
Content-Type: text/plain;version=0.0.4;charset=utf-8
Content-Length: 11984
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 1000.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 22.91518926
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.197442417
Comment From: dimon8829
And if you increase the number of requests from 1000 to 10000?
int numRequests = 1000;
Comment From: mhalbritter
After running it multiple times with 10000 requests, I now got it:
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 19995.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 570.818983592
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 9.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.353837208
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.282042959
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.163144375
Comment From: wilkinsona
There's some overlap between this issue and https://github.com/spring-projects/spring-boot/issues/33300. However, the concerns about the benchmarking tool closing connections prematurely (from the server's perspective) doesn't apply here unless the HTTP client is doing something similar.
Comment From: mhalbritter
It looks like the culprit is org.springframework.http.server.reactive.observation.DefaultServerRequestObservationConvention#status
, which is in Spring Framework, not in Boot:
if (context.isConnectionAborted()) {
return HTTP_OUTCOME_UNKNOWN;
}
Not sure if this can be triggered in any other case.
When setting a breakpoint in org.springframework.web.filter.reactive.ServerHttpObservationFilter#filter(org.springframework.web.server.ServerWebExchange, org.springframework.http.server.reactive.observation.ServerRequestObservationContext, reactor.core.publisher.Mono<java.lang.Void>)
, you'll see that observationContext.setConnectionAborted(true)
is called when the Mono call
is cancelled.
Comment From: mhalbritter
Maybe it's a duplicate of https://github.com/spring-projects/spring-framework/issues/29720
Comment From: dimon8829
mhalbritter This problem was not in version 2.2.13.RELEASE, but it appeared in version 3.1.2.
Therefore, I'm not sure that this issue is a duplicate of #29720
Comment From: rohinisb
We are seeing the same issue in production. When hitting the API through Postman, I can see 200 response code, but the same is captured as UNKNOWN in /actuator/prometheus
. We are using Spring-boot 3.1.0
Comment From: bclozel
Closing because of https://github.com/spring-projects/spring-framework/issues/29720#issuecomment-1745213523