Hi all. When I switched to spring boot from version 2.2.13.RELEASE to version 3.1.2, there was a problem with the metrics.

application.yml:

management:
  endpoint:
    health:
      show-details: "ALWAYS"
    endpoint:
      metrics:
        enabled: true
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"
  metrics:
    export:
      prometheus:
        enabled: true

Namely, in some metrics, the UNKNOWN status appeared. Although in fact all requests were successful and should have a status of 200:

http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 9993.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 80.860505828
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 10.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.170536004

This can be easily reproduced by sending 1000 parallel requests to the /actuator/prometheus endpoint:

    public static void main(String[] args) {
        int numThreads = 10;
        int numRequestsPerThread = 1000;

        ExecutorService executorService = Executors.newFixedThreadPool(numThreads);
        HttpClient httpClient = HttpClient.newHttpClient();

        for (int i = 0; i < numThreads; i++) {
            executorService.submit(() -> {
                for (int j = 0; j < numRequestsPerThread; j++) {
                    sendHttpRequest(httpClient, "http://localhost:8080/actuator/prometheus");
                }
            });
        }

        executorService.shutdown();
    }

    private static void sendHttpRequest(HttpClient httpClient, String url) {
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(url))
                .GET()
                .build();

        try {
            HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
            if (response.statusCode() != 200) {
                System.out.println("Response Code: " + response.statusCode());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

What could be causing this problem and how to fix it? Thank you!

Comment From: mhalbritter

Hello!

I can't reproduce this:

> curlie :8080/actuator/prometheus | grep -i http_server_requests_seconds
HTTP/1.1 200
Content-Type: text/plain;version=0.0.4;charset=utf-8
Content-Length: 12198
Date: Thu, 10 Aug 2023 12:01:02 GMT

# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 20006.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 18.324825225
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.0143335

Maybe it has something to do with dropped requests? Do you have a sample application where this happens all the time?

Comment From: dimon8829

Thank you for response! Yes, I runnig spring boot app:

Reproducer: untitled.zip

And then I get this result:

# HELP http_server_requests_seconds  
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 996.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 143.624052605
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 5.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.855936009

Nobody dropped this requests. Thank you!

Comment From: snicoll

Please move all of this in a project that we can run ourselves. We'd have that to confirm your report anyway and there's a chance to miss a step in doing so.

Comment From: dimon8829

untitled.zip

Comment From: mhalbritter

Those are the "successful" requests:

http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 996.0

Those are the "unsuccessful" ones:

http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 5.0

So something is different with these requests. Let's see if we can find this when we get the sample.

Comment From: mhalbritter

Nope, sorry, I can't reproduce this:

curlie :8080/actuator/prometheus | grep -i http_server_requests_seconds
HTTP/1.1 200 OK
Content-Type: text/plain;version=0.0.4;charset=utf-8
Content-Length: 11984

# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 1000.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 22.91518926
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.197442417

Comment From: dimon8829

And if you increase the number of requests from 1000 to 10000?

 int numRequests = 1000;

Comment From: mhalbritter

After running it multiple times with 10000 requests, I now got it:

# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 19995.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 570.818983592
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 9.0
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.353837208
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.282042959
http_server_requests_seconds_max{error="none",exception="none",method="GET",outcome="UNKNOWN",status="UNKNOWN",uri="/actuator/prometheus",} 0.163144375

Comment From: wilkinsona

There's some overlap between this issue and https://github.com/spring-projects/spring-boot/issues/33300. However, the concerns about the benchmarking tool closing connections prematurely (from the server's perspective) doesn't apply here unless the HTTP client is doing something similar.

Comment From: mhalbritter

It looks like the culprit is org.springframework.http.server.reactive.observation.DefaultServerRequestObservationConvention#status, which is in Spring Framework, not in Boot:

if (context.isConnectionAborted()) {
  return HTTP_OUTCOME_UNKNOWN;
}

Not sure if this can be triggered in any other case.

When setting a breakpoint in org.springframework.web.filter.reactive.ServerHttpObservationFilter#filter(org.springframework.web.server.ServerWebExchange, org.springframework.http.server.reactive.observation.ServerRequestObservationContext, reactor.core.publisher.Mono<java.lang.Void>), you'll see that observationContext.setConnectionAborted(true) is called when the Mono call is cancelled.

Comment From: mhalbritter

Maybe it's a duplicate of https://github.com/spring-projects/spring-framework/issues/29720

Comment From: dimon8829

mhalbritter This problem was not in version 2.2.13.RELEASE, but it appeared in version 3.1.2.
Therefore, I'm not sure that this issue is a duplicate of #29720

Comment From: rohinisb

We are seeing the same issue in production. When hitting the API through Postman, I can see 200 response code, but the same is captured as UNKNOWN in /actuator/prometheus. We are using Spring-boot 3.1.0

Comment From: bclozel

Closing because of https://github.com/spring-projects/spring-framework/issues/29720#issuecomment-1745213523