Spring WebClient retains body of first request in memory

Affects: 5.3.25 (Spring boot 2.7.8)

When the first action used with WebClient is a POST or PUT with some body, that body seems to be retained in memory. When this body is significantly big, this means that an unnecessary chunk of memory will be used.

Sample project containing some tests that show this, triggering garbage collects and showing the memory use after each call: https://github.com/BertScholten/spring-webflux-memory-issue

When the first method is a PUT with smaller body, or a GET, the memory stays at roughly the initial memory. When the first method is a PUT with a significantly big body, the memory use goes up. In the test case used, it triples the initial memory used and that does not seem to drop.

Subsequent calls, no matter the body size do not increase memory used further.

Comment From: edyda99

Hi, correct me if I say anything wrong, I checked your repo, and I believe the tests are not reliable as they are calling the same method several times which may cause compiler optimization and code caching, plus running gc() won't result in garbage collection necessarily. Finally if I run a profiler, it is hard to debug as most memory usage is caused by IntStream.range(1,10000000).toArray();

So what I did is i split your project into two modules, each running on a separate ports, and converted big and small methods into three: big() medium() small() each and did some changes: before:

 @GetMapping(value = "/small")
  public ResponseEntity<String> small() {
    poster.writeJson(IntStream.range(1, 10).toArray());
    return ResponseEntity.ok("Small done");
  }

  @GetMapping(value = "/big")
  public ResponseEntity<String> big() {
    poster.writeJson(IntStream.range(1, 10000000).toArray());
    return ResponseEntity.ok("Big done");
  }

after:

@GetMapping(value = "/small")
  public Mono<String> small() {
    return poster.writeJson(IntStream.range(1, 10).toArray());
  }

  @GetMapping(value = "/medium")
  public Mono<String> medium() {
    return poster.writeJson(IntStream.range(1, 10000000/2).toArray());
  }

  @GetMapping(value = "/big")
  public Mono<String> big() {
    return poster.writeJson(IntStream.range(1, 10000000).toArray());
  }

 public Mono<String> writeJson(final Object object) {
    return fileServerWebClient.put()
            .uri("/dead-letter")
            .contentType(MediaType.APPLICATION_JSON)
            .bodyValue(object)
            .retrieve()
            .bodyToMono(String.class);
  }

before:

 @GetMapping(value = "/dead-letter")
  public ResponseEntity<String> deadLetter() {
    return ResponseEntity.ok("Ignored get");
  }

  @PutMapping(value = "/dead-letter")
  public ResponseEntity<String> deadLetter(@RequestBody final String body) {
    return ResponseEntity.ok("Ignored put");
  }
````

After:

@GetMapping(value = "/dead-letter") public Mono deadLetter() { return Mono.just("empty letter" + new Random().nextInt()); }

@PutMapping(value = "/dead-letter") public Mono deadLetter(@RequestBody final String body) { return Mono.just("empty letter" + new Random().nextInt());


And I runned intellij profiler twice on the module containing dead-letter endpoint, 

first time: 
called big(), then medium() then small()
second time:
called small(), then medium(), then big()

The results were that the second profiler has higher memory allocation

**Comment From: BertScholten**

Hi, thanks for looking into this.

I created the example application like this so you could either:

- run the application and call the `small` and `big` end points manually and then use visualvm or something like that to trigger/inspect a heap dump.
- or run the unittests (though you have to run those class-by-class) and look at the logging.

Sure, there are no guarantees with calling `gc()`, but didn't know of another way to show this in a unittest and I did get the same results as what I'd get with the heap dumps and thought it might help with looking into this. Your approach would be the same however, and is more like how our actual setup works (2 separate vm's).

I'm not that familiar with intellij's profiler. Is the higher memory allocation a statistic over the whole runtime, or does it tell you what memory you end up with if you'd garbage collect at a specific moment?

**Comment From: bclozel**

Sorry it took us so long to reply here. I think there are several things to consider here.

First, if you believe that there is a memory leak, you can verify that using the memory leak detected provided by Netty. Spring Boot has a dedicated configuration property for that [`spring.netty.leak-detection=PARANOID`](https://docs.spring.io/spring-boot/docs/current/reference/html/application-properties.html#application-properties.web.spring.netty.leak-detection). I ran your sample with this and did not see any warning about leaks.

Also, I believe the sample is invalid in the first place, as the `WebClient` calls `block` the server exchanges. I did change the sample like this:

```java
  @GetMapping(value = "/big")
  public Mono<ResponseEntity<String>> big() {
    return poster.writeJson(IntStream.range(1, 10000000).toArray()).thenReturn(ResponseEntity.ok("Big done"));
  }

    public Mono<Void> writeJson(final Object object) {
        return fileServerWebClient.put()
                .uri("/dead-letter")
                .contentType(MediaType.APPLICATION_JSON)
                .bodyValue(object)
                .retrieve()
                .toBodilessEntity()
                .then();
    }

Finally, I think this issue is more about the memory consumption model which is different with Netty. It is using a mix of heap and direct memory (which is pooled). I think that what you're really seeing is the Netty ByteBuf pooled memory increasing depending on the usage. Here, requesting a large amount of memory increases the reserved direct memory, but this doesn't mean that this is a leak nor that this is a problem for the application.

You can dig into this by looking at the Reactor Netty metrics for the HTTP client. You can get started by adding the actuator dependency:

    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>

then exposing the metrics on the actuator endpoint (application.properties):

management.endpoints.web.exposure.include=metrics

And then look at the used direct memory (vs "active" direct memory) by checking the dedicated endpoint as well as other gauges:

$ http localhost:8080/actuator/metrics/reactor.netty.bytebuf.allocator.used.direct.memory

I'll close this issue now, as there is nothing actionable on the Spring Framework side. We will reopen this issue if it turns out that there is a memory leak in our codecs. Thanks!