Affects: 5.2.5.RELEASE

This feature request arises from this StackOverflow post. It could arguably fit either with spring-framework or reactor-netty, posting in spring-framework to start the discussion.


Context

Some content delivery networks recommend to re-resolve DNS on certain types of status codes (e.g. 500 internal server error). To achieve this, I've added a custom Netty DnsNameResolver and DnsCache, but I also need to close the connection, otherwise it will be released back to the pool and DNS will not be re-resolved.

Two approaches have come to light so far:

Approach 1: using Webclient alone

This was suggested by Violeta Georgieva in this answer:

return this.webClient
           .get()
           .uri("/500")
           .retrieve()
           .onStatus(status -> status.equals(HttpStatus.INTERNAL_SERVER_ERROR), clientResponse -> {
                clientResponse.bodyToFlux(DataBuffer.class)
                              .subscribe(new BaseSubscriber<DataBuffer>() {
                                  @Override
                                  protected void hookOnSubscribe(Subscription subscription) {
                                      subscription.cancel();
                                  }
                              });
                return Mono.error(new IllegalStateException("..."));
           })
           .bodyToMono(String.class);

Basically, subscribing to the response body and cancelling it immediately seems to close the connection. However, this relies on Spring's undocumented internal behaviour and causes bodies of error responses to be lost.

Approach 2: using Reactor Netty's TcpClient

This approach was my initial attempt, configuring the underlying TcpClient used by WebClient:

TcpClient tcpClient = TcpClient.create()
        .observe((connection, newState) -> {
            if (newState == State.RELEASED && connection instanceof HttpClientResponse) {
                HttpResponseStatus status = ((HttpClientResponse) connection).status();
                if (status.codeClass() != HttpStatusClass.SUCCESS) {
                    connection.dispose();
                }
            }
        });

This approach also feels clunky and leads to a potential race condition: if the connection is released back to the pool after an error status code, and the observer is notified to close that connection, a new request could acquire that pooled connection in parallel.

Other approaches?

Are there any other approaches that one could implement to close the connection on error status codes? Could some more convenient interfaces be added to Spring or Reactor Netty's APIs?

Comment From: bclozel

Out of curiosity @PyvesB , could you point us to the CDN guidelines you're talking about?

We need to carefully consider all implications and combinations here (HTTP protocol versions, many connections to a host in the pool, avoiding memory leaks, etc).

Thanks!

Comment From: rstoyanchev

Also a couple more questions.

1) Can you clarify how the DNS re-resolving works, e.g. does the error need to be propagated downstream which would then retry, or is it transparent and the downstream never sees the error?

2) Would there be a need to also clean up other connections to the same server that may already be in the pool?

In addition, on the SO thread @violetagg had suggested doOnResponse or doAfterResponseSuccess but you mentioned that doesn't work with WebClient? I'd like to take a closer look at that to see if it is expected behavior or not. Do you have a sample handy?

Comment From: PyvesB

Out of curiosity @PyvesB , could you point us to the CDN guidelines you're talking about?

I don't want to name-drop too much, but if you search for "DNS lookups, upload retries, and timeouts" in your favourite search engine, you may get lucky! 🙄

  1. Can you clarify how the DNS re-resolving works, e.g. does the error need to be propagated downstream which would then retry, or is it transparent and the downstream never sees the error?

The reactive chain currently looks similar to the following:

Mono.fromSupplier(() -> webClient.
            . ...
            .retrieve())
        .flatMap(ResponseSpec::toBodilessEntity)
        .doOnError(t -> dnsCache.clear(someHost))
        .retryWhen(...)
        . ...

In other words, the entry is cleared from the DNS cache and the WebClient operation is retried on error. I would like the WebClient to acquire a new connection and therefore re-resolve DNS as entries for that host are no longer cached.

  1. Would there be a need to also clean up other connections to the same server that may already be in the pool?

I was currently working with a single connection in mind to simplify the problem. But you're right that a useful refinement would be to close other pooled connections to that same server.

In addition, on the SO thread @violetagg had suggested doOnResponse or doAfterResponseSuccess but you mentioned that doesn't work with WebClient? I'd like to take a closer look at that to see if it is expected behavior or not. Do you have a sample handy?

Doing it in doAfterResponseSuccess seems to hang the reactive chain. I've setup a sample project that hopefully illustrates that problem (one of the unit tests times out): https://github.com/PyvesB/spring-webclient-connection-close

Comment From: rstoyanchev

@violetagg from what I can see with the sample, for doAfterResponseSuccess, the WebClient subscribes for the response body, then connection.dispose() calls FluxReceive#cancel and so the connection is closed but the subscriber doesn't receive any further signals.

Currently ClientResponse has releaseBody() to consume and release data buffers. It also supports bodyToMono(Void.class) expecting an empty response but cancelling otherwise. Maybe we could add cancelBody() which would be pretty close to bodyToMono(Void.class) but essentially closing the connection unconditionally.

This could then be used as follows:

webClient.method(HttpMethod.POST)
    .uri(uri)
    .body(...)
    .retrieve()
    .onStatus(status -> {...}, resp -> resp.cancelBody().then(DNS_CACHE_ERROR))
    .toBodilessEntity()
    .retry(Retry.maxInARow(3).filter(DnsCacheException.class));

Comment From: PyvesB

Maybe we could add cancelBody() which would be pretty close to bodyToMono(Void.class) but essentially closing the connection unconditionally.

Would that cause the body to be lost though? The bodies of error status codes may provide valuable monitoring information as to why the response was an error in the first place.

Comment From: rstoyanchev

Yes, it would mean the body is ignored.

I don't see any good options at the WebClient level I'm afraid. We'd have to allow all the usual ways to read the body but there is no way to expose the connection at the end of that. Moreover a Connection abstraction over Netty, Jetty, (as well as Apache HttpClient, and JDK 11 HttpClient to come in 5.3) is unlikely to work out well. It's not something we want to pursue. This level of control is best at the HTTP client library level.

@violetagg what are your thoughts on doAfterResponseSuccess? As this is meant to be after HttpClientState.RESPONSE_COMPLETED I wonder if this here shouldn't be in reverse order? I think the closing of the connection prevents the onComplete signal to the subscriber.

Comment From: violetagg

@PyvesB @rstoyanchev Can you check and test https://github.com/reactor/reactor-netty/pull/1056

Comment From: rstoyanchev

Thanks @violetagg that works great.

Comment From: PyvesB

Thanks a lot, great collaboration! :+1: