Bug description Basically, the title.

We have context windows of around ~80-90K tokens and we've observed a TTFT in the order of 20-30s, which would sometimes cause a timeout from the Spring AI side.

As anyone experienced this before?

We are working with streaming, and the actual "output generation" in terms of tokens per second is very okay and stable, it just seems that there might be something related with the TTFT?

Environment Spring AI 1.0.0-M1 Java 21 Springboot 3.3.0

Steps to reproduce Send a prompt to Claude 3 Sonnet with ~85K context window size using streaming mode. Observe the time it takes to generate first token (20-30s, often with timeouts).

Expected behavior Ideally the timeout period would be bigger from the framework side and/or the generation would be faster. I wonder if this is expected even when using streaming.

Minimal Complete Reproducible example A prompt with a long list of names, for example, to hit 85K input tokens.