Spring AI Double remote call on DefaultChatClient with content and chatResponse

Bug description In DefaultChatClient#DefaultCallResponseSpec (also in streaming) call chatResponse or conent trigger the remote call twice.

Environment Spring AI version M5 confirmed also on main

Steps to reproduce simply call chatResponse and content and double remote service is invoked

Expected behavior per "DefaultCallResponseSpec" access to same response (memoized)

Minimal Complete Reproducible example

var response = chatClient.prompt()
        .advisors(new ReReadingAdvisor())
        .call();
response.content();
response.chatResponse();

Comment From: ThomasVitale

Thanks for raising this issue. The API is designed to support only one terminal call operation via the CallResponseSpec: content(), chatResponse(), entity(), or responseEntity(). I would say that it's by design that if the call action is called multiple times, then the model is called on each call.

That's the same behaviour present in other Client APIs in the Spring portfolio, such as RestClient.

If you need full access to the response (e.g. metadata), I suggest using the chatResponse() terminal operation and then extract the information you need from there (including the content).

Comment From: Grogdunn

Ok, but maybe like "java stream API" if a terminal operation is called twice the second call throws.

At the moment the feeling are the "content" call are a shortcut to invoke "chatResponse().getResult().getOutput().getText()" on SAME response.

Last 2 cents seems "DefaultResponseSpec" in DefaultRestClient save the ClientHttpResponse clientResponse and make the operation on that, but not invoke remote service each time that I invoke "getBody" (maybe throw if the body is a consumed)

Comment From: tzolov

Hi @Grogdunn,

I agree with @ThomasVitale - we should call the AI model endpoint for each call()/stream() invocation. AI endpoints aren't idempotent - identical inputs will return different responses, so caching previous responses isn't appropriate.

Could you explain how this impacts your use case? If you need caching, you could implement it as a custom advisor but be aware of the side effects of applying it.

Comment From: Grogdunn

Hi @tzolov, I agree with you for call or stream methods. Each call invocation should invoke LLM. The non-intuitive API is on CallResponseSpec or DefaultCallResponseSpec (the return of call).

As my first post:

CallResponseSpec response = chatClient.prompt()
        .advisors(new ReReadingAdvisor())
        .call(); //Only one CALL method invocation
response.content(); //this really invoke LLM
response.chatResponse(); //this also invoce LLM (twice)

The content and chatResponse methods are not simply "accessors" to response, but really do the LLM invocation

Comment From: Grogdunn

Also this will trigger the remote call twice:

CallResponseSpec response = chatClient.prompt()
        .advisors(new ReReadingAdvisor())
        .call(); //Only one CALL method invocation
response.content(); //this really invoke LLM
response.content(); //this also invoce LLM (twice)