I found #21862 which is pretty close to my request but closed.

I am currently using Spring WebClient with Spring Boot 2.2.6 and Spring Framework 5.2.5 writing a service that sits in front of a number of other upstream services and transforms their response for public consumption. Some of these services respond with very large JSON payloads that are little more than an array of entities wrapped in a JSON document, usually with no other properties:

{
    "responseRoot": {
        "entities": [
            { "id": "1" },
            { "id": "2" },
            { "id": "n" },
        ]
    }
}

There could be many thousands of entities in this nested array and the entire payload can be tens of MBs. I want to be able to read in these entities through a Flux<T> so that I can transform them individually and write them out to the client without having to deserialize all of them into memory. This doesn't appear to be something that Spring WebFlux supports out of the box.

I'm currently exploring writing my own BodyExtractor which reuses some of the code in Jackson2Tokenizer to try to support this. My plan is to accept a JsonPointer to the location of the array and then parse asynchronously until I find that array, then to buffer the tokens for each array element to deserialize them.

var flux = client.get()
    .uri(uri)
    .exchange()
    .flatMapMany(r ->
        r.body(new StreamingBodyExtractor(JsonPointer.compile("/responseRoot/entities")))
    );

Before I go too far down this path I was curious if this was functionality that Spring would be interested in supporting out of the box.

Similarly, I was curious about the functionality of being able to stream out a response from a WebFlux controller via a Flux<T> where the streamed response would be wrapped in a JSON array and possibly in a root JSON document as well?

Comment From: HaloFour

Here's a very quick&dirty implementation of the BodyExtractor implementation:

https://gist.github.com/HaloFour/ce3063d4e693b495e3c194cbb2f66686

The actual token parsing could certainly be cleaned up but it gets the job done at least to the extent that existing integration tests in the project are passing.

Comment From: HaloFour

Also, not to pile up additional requests in a single issue, but I didn't see a way to use a BodyExtractor with retrieve() which would force me to manually interpret the HTTP status error codes. Is there a reason WebClient.ResponseSpec doesn't include a method that accepts a BodyExtractor?

Comment From: rstoyanchev

@HaloFour thanks for the proposal.This looks feasible and probably worth doing but mainly I'm wondering about what a more general solution looks like and how much more general does it need to be.

For example the case of multiple arrays such as in #21862. We could accept multiple JSON pointers but it's less obvious how to represent the output which logically is Flux<T1>, Flux<T2>, etc but needs to be exposed sequentially, i.e. Flux<Flux<?>> which is not great for generics and it might as well be Flux<Object> where the application has to check the Object type and downcast accordingly. An even more challenging question is what if you want to extract the surrounding Object structure as in #25472?

Comment From: fransflippo

Thanks for this, @HaloFour ! Looks like something I was looking for (hence #25472). I'll give your Gist a try.

@rstoyanchev (just reiterating from #25472 ) I think it makes sense to focus on the most common case of a single array of a single type of object in the JSON response. The semantics of anything else, like you explain, becomes very hairy very quickly and the applicability of it seems low for most real world scenarios (imho).

Comment From: HaloFour

Thanks for taking a look! Here's a newer Gist based on the code that we're currently using in production.

Comment From: rstoyanchev

Yes it make sense to do something that would solve many cases. That said other possible cases are not that far to see. Take for example #21862 or even for Elasticsearch isn't it necessary sometimes to access something else besides the hits, like "search_after"?

Comment From: joedevgee

going back to the original question, with the new API, exactly how do we extract the entities under responseRoot ?

Comment From: nilsga

toEntityFlux(streamingBodyExtractor.toFlux(MyClass.class, JsonPointer.compile("/pathToArray"))) worked for me. This seems very useful. Any chance this BodyExtractor can be added to Spring?

Comment From: simonbasle

for the original use case of json-pointing to an array in order to stream-parse it, I think it would be better to delegate that responsibility to Jackson and probably just offer an lightweight BodyExtractor adapter in Framework.

Unfortunately, even though in Jackson-Core there is a FilteringParserDelegate which can accept a JsonPointerBasedFilter, this doesn't work for async parsers for now (see https://github.com/FasterXML/jackson-core/issues/1144)...

@HaloFour maybe there's an opportunity to contribute something there?

Comment From: HaloFour

Sure, I can take a look at that.