SpringBoot Provide a setting for automatic context propagation with reactor-core

reactor-core 3.5.3 introduced a way to automatically propagate ThreadLocal values registered with the context-propagation library. Combined with the use of micrometer-tracing it can allow users to log anywhere in their reactive chains with tracing information present, without the need to use handle/tap operators.

https://github.com/reactor/reactor-core/pull/3335 adds a static method: Hooks.enableAutomaticContextPropagation() that can be triggered at the start of an application to enable this feature.

The call to this method can be handled by Spring Boot and users would just interact with a Spring Boot property to enable/disable this feature.

Comment From: chemicL

Important note is that the above feature is dependant on updating the dependencies for micrometer and context-propagation as listed in Micrometer docs:

Micrometer Context-Propagation 1.0.2
Micrometer 1.10.4
Micrometer Tracing 1.0.2

Comment From: bclozel

@chemicL If I understand correctly, without this setting context propagation is only managed with the tap/handle operators, or by using contextCapture. Using this hook provided by reactor turns it on for all operators.

I'm wondering if we should create a spring.reactor.context-propagation accepting enum values such as AUTO (default) and MANUAL. This would leave us some room for possible new modes in the future. What do you think?

Comment From: chemicL

Currently, as long as context-propagation library is on the classpath, reactor-core does the following (and there's no way to turn it off unless context-propagation is excluded from the runtime):

for tap and handle it restores the ThreadLocals from Reactor Context
contextCapture writes all ThreadLocal values to Reactor Context under the same keys for which ThreadLocalAccessors are registered

When Hooks.enableAutomaticContextPropagation() is called, the following changes:

all operators upstream from a contextWrite() or contextCapture() will have ThreadLocal values associated with keys present in Reactor Context propagated
handle and tap won't do anything special for that as the values are already present

In both scenarios, contextCapture is required to store current ThreadLocals into Reactor Context if any were written to. That won't be necessary e.g. when returning a Flux or Mono to which Spring subscribes where it attaches the Observation before calling an endpoint in case Micrometer is used (and the Observation is not even in current ThreadLocal).

I agree the enum approach is flexible and can let us introduce different mechanisms if we come up with any. For the naming, I'd go with AUTO and LIMITED instead of MANUAL. Intuitively, I'd expect MANUAL to mean direct use of the context-propagation library API to restore ThreadLocals.

Comment From: kzander91

@bclozel Could it make sense to call Hooks.disableAutomaticContextPropagation() on context shutdown and/or when ContextPropagationMode != AUTO? I was wondering if this particular solution could otherwise impact other Spring Context's running in the same JVM, for example during tests.

Comment From: bclozel

Could it make sense to call Hooks.disableAutomaticContextPropagation() on context shutdown and/or when ContextPropagationMode != AUTO? I was wondering if this particular solution could otherwise impact other Spring Context's running in the same JVM, for example during tests.

As far as I understand, this is a static call, so by definition this applies to the entire JVM. Removing this queue wrapper during context shutdown or when the mode is not set to "AUTO" is just likely to disable the feature for other contexts that are still running and require it.

This feature is additive, the performance cost has been measured and should not impact applications significantly. Unless tests are relying on the fact that ThreadLocal values are not restored within reactor operators, this configuration change should not influence other behaviors.

Comment From: wilkinsona

Given the Micrometer requirement for the hook to have an effect, I wonder if this should be auto-configured in spring-boot-actuator-autoconfigure rather than spring-boot-autoconfigure. I also wonder if the property prefix should also reflect that it is related to tracing. spring.reactor.tracing.context-propagation perhaps?

Comment From: wilkinsona

It looks like we're currently relying on context-propagation being in testCompileClasspath through spring-integration-core. When https://github.com/spring-projects/spring-integration/pull/8556 is merged, that'll no longer be the case and the tests won't compile.

Comment From: wilkinsona

I've pushed cae8c14ac776e5538c73e55522fa70679737de4c to work around the compile failure following the merge of the PR in Spring Integration. We should see if we can remove the need for it as part of finishing off this issue.

Comment From: wilkinsona

We discussed this today and decided that we need more time to make sure we've got this right. We're going to revert the changes for now and revisit this for a subsequent 3.0.x release. In the meantime, a manual call to Hooks.enableAutomaticContextPropagation() can be made in apps that are affected by the problem described in #33372.

Comment From: wilkinsona

The changes have been reverted in 88de3cc.

Comment From: davidmelia

@wilkinsona I am prototyping Hooks.enableAutomaticContextPropagation() and have noticed that integrations such as Spring Data Mongo do not propagate the trace ID - I assume this is because it hands off to a different even loop?

Comment From: rhanton

@wilkinsona Darn, was waiting for this to be released to help fix some trace propagation issues I was seeing when using micrometer for internal tracing (I presume this will fix my internal @Observed traces not picking up the external trace id when using spring-reactive). I will keep waiting with bated breath!

Comment From: wilkinsona

No need to wait. Just add a call to Hooks.enableAutomaticContextPropagation() somewhere in your app.

Comment From: rhanton

Sweet - I'll try that - thanks @wilkinsona

Comment From: VinishKumar1

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well?

Comment From: chemicL

I am prototyping Hooks.enableAutomaticContextPropagation() and have noticed that integrations such as Spring Data Mongo do not propagate the trace ID - I assume this is because it hands off to a different even loop?

@davidmelia this issue should get automatically fixed if we introduce this improvement in reactor-core: https://github.com/reactor/reactor-core/issues/3366. However, for the time being, you can follow any interaction with external sources with a .contextWrite(Function.identity()) immediately after the Flux/Mono obtained from such integrations. This boundary will restore ThreadLocals in your code. For the library code to log with trace-id when using a foreign Publisher (e.g. spring-data-mongodb's use of reactive mongoDB driver), they'd need to do the same. It's unfeasible to do everywhere, so we're considering adding it to reactor-core.

Comment From: chemicL

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well?

@VinishKumar1 please provide a reproducer as an issue in reactor-core, we'll have a look.

Comment From: rafallezanko

I am prototyping Hooks.enableAutomaticContextPropagation() and have noticed that integrations such as Spring Data Mongo do not propagate the trace ID - I assume this is because it hands off to a different even loop?

@davidmelia this issue should get automatically fixed if we introduce this improvement in reactor-core: reactor/reactor-core#3366. However, for the time being, you can follow any interaction with external sources with a .contextWrite(Function.identity()) immediately after the Flux/Mono obtained from such integrations. This boundary will restore ThreadLocals in your code. For the library code to log with trace-id when using a foreign Publisher (e.g. spring-data-mongodb's use of reactive mongoDB driver), they'd need to do the same. It's unfeasible to do everywhere, so we're considering adding it to reactor-core.

Your suggestions with a .contextWrite(Function.identity()) works for me, so I wait for the fix reactor/reactor-core#3366. Thank you!

Comment From: Stillglade

After upgrading to 3.0.4 (which has the correct dependency versions), adding context-propagation, and adding the Hooks.enableAutomaticContextPropagation() call into my main method, I still was not seeing any tracing.

I was able to make it work by updating a Spring Security annotation @EnableReactiveMethodSecurity(useAuthorizationManager = false) by setting the useAuthorizationManager to false and then it started working. I guess the new Authorization API (see here) which works with the Reactor Context must be causing some issue. I am hopping that the fix in Reactor Core mentioned above fixes this as well.

Comment From: bclozel

Given the Micrometer requirement for the hook to have an effect, I wonder if this should be auto-configured in spring-boot-actuator-autoconfigure rather than spring-boot-autoconfigure. I also wonder if the property prefix should also reflect that it is related to tracing. spring.reactor.tracing.context-propagation perhaps?

While this feature is highly requested by developers using tracing support, this option is really about enabling a context propagation feature in Reactor. The context-propagation project lives in the micrometer organization, but it can be used for its main purpose without involving tracing. For example, this is a mandatory dependency for Spring for GraphQL; it is required for propagating the GraphQL context between asynchronous and reactive processing. Many applications are using the MDC logging feature or ThreadLocals and could use this support outside of tracing.

For these reasons, I'm not in favor of moving this feature to the actuator module or changing its configuration property name. Even if the tracing support has still some limitations, I think that making this feature available (and the default) as soon as possible is required if we want our community to adopt Spring Boot 3.0.

Comment From: pgovindan-cue

@EnableReactiveMethodSecurity(useAuthorizationManager = false)

Is there any way to work around this issue?

Comment From: wilkinsona

@pgovindan-cue Please direct questions about Spring Security to the Spring Security team. If you believe you've found a bug, please open an issue.

Comment From: janekbettinger

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well?

@VinishKumar1 we observed this as well in a construct like this:

return webClient.get()
    .uri("https://httpbin.org/status/400")
    .retrieve()
    .onStatus(HttpStatusCode::isError, clientResponse -> Mono.error(new RuntimeException("ERROR")))
    .bodyToMono(Object.class)
    .doOnError(throwable -> LOGGER.error("Error: {}", throwable.getMessage()) /* lacks tracing information in the MDC */)
    .onErrorResume(throwable -> Mono.just(throwable.getMessage()));

and could fix it by not creating a new Mono in the onStatus step:

return webClient.get()
    .uri("https://httpbin.org/status/400")
    .retrieve()
    .onStatus(HttpStatusCode::isError, ClientResponse::createException)
    .bodyToMono(Object.class)
    .doOnError(throwable -> LOGGER.error("Error: {}", throwable.getMessage()) /* includes tracing information in the MDC */)
    .onErrorResume(throwable -> Mono.just(throwable.getMessage()));

Your scenario might be different but I hope that this points you into the right direction.

Comment From: jonathannaguin

@janekbettinger Have you tried with latest reactor version? https://github.com/reactor/reactor-core/releases/tag/v3.5.5

It includes the following:

Propagate ThreadLocals for non-Reactor upstream sources by @chemicL in https://github.com/reactor/reactor-core/pull/3418

Comment From: janekbettinger

@janekbettinger Have you tried with latest reactor version? https://github.com/reactor/reactor-core/releases/tag/v3.5.5

Yes, 3.5.5 worked well (i.e. fixed tracing) for the majority of log statements, but not for the scenario described in my previous comment. I have a minimum working example that I could share later today or tomorrow if you're interested.

Comment From: janekbettinger

@jonathannaguin you can find the example that shows the behavior here: * https://github.com/janekbettinger/webflux-micrometer-tracing

Comment From: jonathannaguin

@janekbettinger yeah, I am also able to reproduce the problem. And actually, another fix would be to do the following

return clientResponse.bodyToMono(byte[].class).map(i->new RuntimeException("ERROR"));

The above seems to retain the context whereas creating a new Mono does not inside onStatus.

Comment From: aksh1618

@janekbettinger @jonathannaguin For me just adding .contextCapture() to the end of the chain worked to propagate the context in all instances of doOnNext, doOneError and implicit (such as in elapsed) & explicit creation of new Flux/Mono. (Using 3.5.5 via Spring Boot 3.0.6). However, I'm calling block on this chain, so can't say for sure if this will work in all cases.

So in conclusion the steps I took: - Upgraded to Spring Boot 3.0.6 - Added spring-boot-starter-actuator dependency (Still not 100% sure why this is needed to trigger start of tracing) along with io.micrometer:micrometer-tracing & io.micrometer:micrometer-tracing-bridge-brave - Called Hooks.enableAutomaticContextPropagation(); - Called .contextCapture() on the reactive chain just before calling .block()

Comment From: mikhilsanghvi

@chemicL I've tried with reactor-core 3.5.5 and I see similar issue spring-data-r2dbc where the trace ID is no propagated back. As a result I am forced to add .contextWrite(Function.identity()) after the repository method like

repository.findAll().contextWrite(Function.identity());

I was hoping the issue https://github.com/reactor/reactor-core/issues/3366 would have resolved this but unfortunately not.

Appreciate any help. Thanks

Comment From: chemicL

@mikhilsanghvi I'm always eager to look into an isolated reproducer. If you prepare one, please feel free to open an issue in reactor-core repository so I can analyse and see what can be done about the problem.

Comment From: chemicL

@janekbettinger Please report the problem you observe as a new issue in reactor-core. I'll try to analyse it and if necessary will redirect to the appropriate repository if the issue lies elsewhere.

Comment From: singhbaljit

I'm seeing an issue with Spring Security where using Hooks.enableAutomaticContextPropagation() with micrometer-tracing-bridge-* library causes the OAuth2 resource servers to break. See https://github.com/spring-projects/spring-security/issues/13431.

Comment From: mhalbritter

After some discussion with the Reactor team, we decided:

We add the property spring.reactor.context-propagation with the values AUTO and LIMITED. It defaults to LIMITED.

Reasoning:

having a boot properly already is a big win for us and gives users the visibility that such a feature exists

if the user launches an app expecting to see trace ids in logs, they can look up boot docs and see the setting, win

if we enable that by default for users who have not relied on that functionality, we automatically degrade their performance - investigating the reasons for that can take more time as they don’t know what is at fault, so that’s an argument against turning that on by default

We need to add something to the tracing docs that users have to set this to AUTO to get trace and span IDs in the logs.

Comment From: oleh-hudyma

@mhalbritter where can we read about performance degradation when using context propagation?

Comment From: Abahafart

there is any solution to print span and trace id in webclient methods? I've the same behavior with Spring boot version 3.1.3

Comment From: mhalbritter

where can we read about performance degradation when using context propagation?

The reactor team is doing some benchmarks and I guess will publish them when they are done.

Comment From: chemicL

We have been evaluating the configurations using some selected scenarios, but sharing any numbers would be counter-productive. The performance impact is a fact, as any such solution interacts with ThreadLocal values, accessing which is quite costly when thinking about squeezing the most out of the hardware. However, it is an accepted compromise, for which you get production grade observability features without having to do manual work. What we worked on is first and foremost presenting a solution which is not as impactful as restoring ThreadLocals for each operator (which preserves correctness) but is not causing ThreadLocal leaks like some approaches taken with Sleuth. We worked on providing a solution which is safe to use and comfortable. However, it comes with non-trivial cost, which can only be evaluated on a per-use-case scenario. So best to take your app and measure the impact with load tests - what happens with the default (limited) mode, what happens without any observability, and what happens if you enable the automatic mode. In some cases the differences will not be noticeable if the app is doing something more significant per request than those TL accesses. On another occasion it can happen that the automatic mode is cheaper than the default mode (e.g. when aggregating a stream of data), while in other cases it can be more costly. So the only recommendation we can give is to test for yourself if this is the price you're ok to pay.