It seems that the issue https://github.com/spring-projects/spring-boot/issues/33070 is reproduced again.
We use
[INFO] +- org.springframework.boot:spring-boot-starter-actuator:jar:3.2.4:compile
[INFO] | +- org.springframework.boot:spring-boot-actuator-autoconfigure:jar:3.2.4:compile
[INFO] | +- io.micrometer:micrometer-observation:jar:1.12.4:compile
[INFO] | | \- io.micrometer:micrometer-commons:jar:1.12.4:compile
[INFO] | \- io.micrometer:micrometer-jakarta9:jar:1.12.4:compile
and it seems that https://github.com/spring-projects/spring-boot/blob/v3.2.4/spring-boot-project/spring-boot-actuator-autoconfigure/src/main/java/org/springframework/boot/actuate/autoconfigure/tracing/prometheus/PrometheusExemplarsAutoConfiguration.java causes a deadllock.
Please, refer to the attached thread dump. thread_dump_with_deadlock.txt
Comment From: wilkinsona
@anvo1115 Thanks for the report, but, as far as I can tell from the thread dump, that doesn't look deadlocked to me. reactor-http-epoll-3 is waiting to lock 0x00000000cf5c3e78 which is held by main. However, main is waiting to lock 0x00000000f19e0530 which isn't held by any other thread in the dump. In other words, judging by the thread dump, once the request that's being made by MicroserviceWebClient has completed, processing can proceed.
Comment From: anvo1115
The issue is that the request is not completed. But after we exclude PrometheusExemplarsAutoConfiguration , the request passed.
Comment From: anvo1115
Comment From: wilkinsona
Unfortunately, the thread dump doesn't explain why the request did not complete. None of the threads appear to be in the process of making an HTTP request so it's not clear why the thread that's waiting for one to complete is stuck. If you would like us to spend some more time investigating, please spend some time providing a complete yet minimal sample that reproduces the problem. You can share it with us by pushing it to a separate repository on GitHub or by zipping it up and attaching it to this issue.
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: wilkinsona
Having looked again at https://github.com/spring-projects/spring-framework/issues/32996, I think I now understand what's happening here.
The main thread is making an HTTP request during bean creation and while Framework's singleton lock is held. While the request's reactive and using WebClient, block() is being called so the main thread cannot proceed until the request has completed. reactor-http-epoll-3 is the thread that's performing the HTTP request. WebClient has been instrumented and the observation for the request is being stopped. This results in an attempt to update the last exemplar. This gets stuck because it tries to use LazyTracingSpanContextSupplier which needs to retrieve the Tracer. Doing so requires Framework's singleton lock which cannot be obtained as it's held by main.
This should already be fixed in Framework 6.2.0-M3 but we need to work something out for earlier releases.
@anvo1115, you could avoid the problem by not doing things on multiple threads while also blocking. That would either mean that you stop calling block or that you use an imperative HTTP client.
On our side, it's becoming increasingly apparent that we need a better way of breaking the MeterRegistry <-> Tracer cycle that exemplars cause. We'll discuss this with the observability team.
Comment From: wilkinsona
@anvo1115 it would be interesting to know if defining the following bean works around the problem for you:
@Bean
TracerSpanContextSupplier spanContextSuppler(Tracer tracer) {
return new TracerSpanContextSupplier(tracer);
}
static class TracerSpanContextSupplier implements SpanContextSupplier {
private final Tracer tracer;
TracerSpanContextSupplier(Tracer tracer) {
this.tracer = tracer;
}
@Override
public String getTraceId() {
Span currentSpan = currentSpan();
return (currentSpan != null) ? currentSpan.context().traceId() : null;
}
@Override
public String getSpanId() {
Span currentSpan = currentSpan();
return (currentSpan != null) ? currentSpan.context().spanId() : null;
}
@Override
public boolean isSampled() {
Span currentSpan = currentSpan();
if (currentSpan == null) {
return false;
}
Boolean sampled = currentSpan.context().sampled();
return sampled != null && sampled;
}
private Span currentSpan() {
return this.tracer.currentSpan();
}
}
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: spring-projects-issues
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.