Spring Missing tracing information after upgrading to Spring Boot 3 with Micrometer tracing

Context

After upgrading to Spring Boot 3, we had to switch our tracing library from spring cloud sleuth to micrometer tracing (referring to the migration guide https://github.com/micrometer-metrics/tracing/wiki/Spring-Cloud-Sleuth-3.1-Migration-Guide).

Expected Behavior

We expect that the migration from sleuth to micrometer should be smooth, providing good backward compatibility. In particular, as the project uses reactive programming extensively, tracing information such as traceId should be preserved in SL4J logs along the reactive chain.

Actual Behavior

Despite already calling Hooks.enableAutomaticContextPropagation(), reactive chain involving reactive web client fails to preserve tracing information such as traceId in SLF4J logs when the chain is invoked using .subscribe. The migration guide clearly suggests that calling Hooks.enableAutomaticContextPropagation() is enough.

Reproductor

I have prepared a reproducer (spring boot test) to demonstrate the potential bug. Please refer to https://github.com/jhengy/my-simple-springboot-app/tree/demo-context-propagation

Follow up questions

To solve the issue, it seems that we have to add .contextCapture() before .subscribe explicitly in all the reactive chains. This seems too repetitive and it was expected that migration from sleuth to micrometer should involve minimall changes to the business code. Also, there seems to be a limitation to using .contextCapture(): it only captures context for logs in the upstream, but not downstream. Is there a better and more convenient way (perhaps some kind of global setting) go about doing this?
It was mentioned in the blog that there is a plan for Spring Boot to configure context propagation out of the box. May we know how soon it can be?
What would be the performance impact(if any) in using Hooks.enableAutomaticContextPropagation() with .contextCapture()?

Thank you.

Comment From: bclozel

Thanks for reaching out with a detailed sample.

We expect that the migration from sleuth to micrometer should be smooth, providing good backward compatibility. In particular, as the project uses reactive programming extensively, tracing information such as traceId should be preserved in SL4J logs along the reactive chain.

We've certainly tried to make the upgrade as easy as possible. The intent was also to solve all the underlying issues with Sleuth's approach: overhead, over-instrumentation, etc.

Despite already calling Hooks.enableAutomaticContextPropagation(), reactive chain involving reactive web client fails to preserve tracing information such as traceId in SLF4J logs when the chain is invoked using .subscribe. The migration guide clearly suggests that calling Hooks.enableAutomaticContextPropagation() is enough.

I don't think this is as simple as that. Maybe the guide can be improved as a result of our conversation here.

Calling subscribe from within a reactive chain means that it decouples completely both reactive streams. In your sample's case, the WebClient will be performing its task completely independently of the controller method where it's been called. If an error occurs with the client, it won't be handled with the server response. In a non-reactive world, this is equivalent to starting a new Thread and giving it a Runnable. By default, new Thread instances don't inherit thread locals so I guess this behavior aligns here. In general, we don't advise calling subscribe manually for this reason.

When automatic context propagation is enabled, Project Reactor automatically captures the context for blocking operators:

In the automatic mode, blocking operators, such as Flux#blockFirst(), Flux#blockLast(), Flux#toIterable(), Mono#block(), Mono#blockOptional(), and relevant overloads, all perform contextCapture() transparently, so in most cases it is not necessary to add it.

I think this makes sense because unlike subscribe, blocking means that the result and errors are directly returned where it's called. To answer your question, I don't know if there's a more convenient way to do this and whether this would be a good thing. Maybe create a reactor issue for this?

It was mentioned in the blog that there is a plan for Spring Boot to configure context propagation out of the box. May we know how soon it can be?

This is already done (see spring-projects/spring-boot/issues/34201). You can opt in with the following configuration property in your sample application:

spring:
  reactor:
    context-propagation: auto

Note that the setup in the sample application is not ideal, using a @PostConstruct annotation to enable this feature. As mentioned in the Reactor docs, this must be done as soon as possible because it only applies for new subscribers. Instead, if you choose not to use the configuration property, you should call the Hook in the main method right before SpringApplication.run.

What would be the performance impact(if any) in using Hooks.enableAutomaticContextPropagation() with .contextCapture()?

I'm not sure I understand. If you enable automatic propagation it will be taken care of for you in most cases. If you wonder about capturing the context for those subscribe calls, I don't think there would be duplicated efforts there. As for performance impact in general, Dariusz shared his findings here.

I don't think there is anything actionable in Spring Framework right now, so I'd suggest getting in touch with the Micrometer team if you think the migration guide could use some updates. You can also reach out to the Reactor team about the block vs. subscribe situation we've discussed here.

Thanks!

Comment From: jhengy

@bclozel thanks for the detailed reply, will follow up with the respective teams

Comment From: chemicL

Thank you @bclozel for covering so many aspects. I added a few points in the reactor-core issue to further clarify the motivation for the current state of affairs.