Why are all the usage values obtained when using the Stream method all 0 and the call method not 0
Comment From: ThomasVitale
Could you share some information about your setup? What version of Spring AI are you using? And which chat model have you experienced the issue with? Thanks!
Comment From: sjh021356
M2 version
openai: api-key: base-url: chat: enabled: true options: stream-usage: true
Comment From: habuma
At least for OpenAI and Ollama (which I've had some recent streaming and usage experience)...
The underlying APIs do not even return usage data on each completion chunk in the stream. It's not even there in the response from the API at all. So, Spring AI puts in an EmptyUsage
which is all zeroes. With OpenAI, you can specify a streaming option to include usage ( OpenAiChatOptions.builder(). ... .withStreamUsage(true)
) and it will include an extra chunk at the end of the stream with the usage for the entire stream.
I've observed that with Ollama, you don't get that extra chunk from the API and there's no way to ask for it. But it seems that Spring AI is (somehow...I've not had opportunity to figure out how) adding usage to the final chunk in the Flux
. So it will be zero until that last chunk.
I believe that the thinking behind this behavior from the APIs (no usage until the last chunk) is because you won't really know what the usage is until the stream has completed. I suppose that there were other options like keeping a running tally on each chunk or just the count for that chunk (which, I think would always be 1 for the generation tokens). But they decided to keep it zero until the end and even then (for OpenAI) only if you ask for it.
Comment From: 1Jack2
Maybe it's the same issue as #814
Comment From: tzolov
I believe this is resolved with #1848 ? @sjh021356 can you confirm please?
Comment From: markpollack
Closing the issue, if there are any problems please get back to us @sjh021356