Current behavior:

WebClient long task timer task active metrics does not show "destination host" in "client_name" label, instead always shows "none".

http_client_requests_active_seconds_active_count{client_name="none",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 7.0

In service which has more than 1 webClient instance to different remote host all long task timer active metrics are aggregated and metrics cannot detect the number of pending requests per destination host. This decrease the value of the metrics as breakdown per destination host calls cannot be done.

When MVC is used instead of webflux then "client_name" label contains correct destination host".

http_client_requests_active_seconds_active_count{client_name="serviceB.local",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 5.0
http_client_requests_active_seconds_active_count{client_name="serviceC.local",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 2.0

Expected behavior: webClient should report "client_name" label correctly matching remote destination address shown in normal metrics.

Potential root cause:

DefaultWebClient exchange() store the requestsBuilder() to carrier() before observation start, but "request" is stored to context only after observation is started just before applying the filter.

DefaultClientRequestObservationConvention/clientName() reads the "client_name" from context when long task timer is started and get "null" because context is not set. Thus "client_name" is not available for long task timer active start but only for normal metrics stop.

MVC works because "client_name" is read from carrier.

There were some other changes in DefaultWebClient blame but I am not sure if those change or broke the the "client_name" because context is still set after start.

public class DefaultClientRequestObservationConvention implements ClientRequestObservationConvention {
    protected KeyValue clientName(ClientRequestObservationContext context) {
        if (context.getRequest() != null && context.getRequest().url().getHost() != null) {
            return KeyValue.of(LowCardinalityKeyNames.CLIENT_NAME, context.getRequest().url().getHost());
        }
        return CLIENT_NAME_NONE;
    }
}
final class DefaultWebClient implements WebClient {
        @SuppressWarnings("deprecation")
        @Override
        public Mono<ClientResponse> exchange() {
            ClientRequestObservationContext observationContext = new ClientRequestObservationContext();
            ClientRequest.Builder requestBuilder = initRequestBuilder();
            return Mono.deferContextual(contextView -> {
                Observation observation = ClientHttpObservationDocumentation.HTTP_REACTIVE_CLIENT_EXCHANGES.observation(observationConvention,
                        DEFAULT_OBSERVATION_CONVENTION, () -> observationContext, observationRegistry);
                observationContext.setCarrier(requestBuilder); // carrier available in start
                observation
                        .parentObservation(contextView.getOrDefault(ObservationThreadLocalAccessor.KEY, null))
                        .start();
                ExchangeFilterFunction filterFunction = new ObservationFilterFunction(observationContext);
                if (filterFunctions != null) {
                    filterFunction = filterFunctions.andThen(filterFunction);
                }
                ClientRequest request = requestBuilder.build();
                observationContext.setUriTemplate((String) request.attribute(URI_TEMPLATE_ATTRIBUTE).orElse(null));
                observationContext.setRequest(request); // context saved after start
                Mono<ClientResponse> responseMono = filterFunction.apply(exchangeFunction)
                        .exchange(request)
                        .checkpoint("Request to " + this.httpMethod.name() + " " + this.uri + " [DefaultWebClient]")
                        .switchIfEmpty(NO_HTTP_CLIENT_RESPONSE_ERROR);
                if (this.contextModifier != null) {
                    responseMono = responseMono.contextWrite(this.contextModifier);
                }
                final AtomicBoolean responseReceived = new AtomicBoolean();
                return responseMono
                        .doOnNext(response -> responseReceived.set(true))
                        .doOnError(observationContext::setError)
                        .doFinally(signalType -> {
                            if (signalType == SignalType.CANCEL && !responseReceived.get()) {
                                observationContext.setAborted(true);
                            }
                            observation.stop();
                        })
                        .contextWrite(context -> context.put(ObservationThreadLocalAccessor.KEY, observation));
            });
        }
}

Possible customization workaround is reading the clientName from context.getCarrier().build() instead from context. But not clear if this is safe way in case webClient implementation change again.

@Configuration
class NdfClientRequestObservationConvention extends DefaultClientRequestObservationConvention {
    private static final KeyValue CLIENT_NAME_NONE = KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, KeyValue.NONE_VALUE);

    @Override
    protected KeyValue clientName(ClientRequestObservationContext context) {
        if (context.getRequest() != null && context.getRequest().url().getHost() != null) {
            return KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, context.getRequest().url().getHost());
        }
        if (context.getCarrier() != null && context.getCarrier().build().url().getHost() != null) {
            return KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, context.getCarrier().build().url().getHost());
        }
        return CLIENT_NAME_NONE;
    }
}

Environment: Java: 17/21 SpringBoot 3.1.5, same issue in 3.1.6 and 3.2.0

Comment From: bclozel

I think this is due to a lifecycle issue in general, but we can probably improve the situation.

We cannot give to the ClientRequestObservationContext a fully built ClientRequest, because such instances are immutable and the tracing obervation handler needs to add a propagation HTTP header to the request before it's sent out. As a result, we're creating the context with a ClientRequest.Builder.

When the observation starts, the propagation header is added and the ".active" metric is started. Because we haven't set the actual request on the context and given the current convention implementation, we're seeing this unfortunate behavior.

I think we can change the ClientRequestObservationContext to tighten up a bit the lifecycle there and possibly build a temporary request for this. It would certainly miss the current observation context and the propagation header for obvious lifecycle reasons, but I don't think this would be a problem for most conventions.