Why are all the usage values obtained when using the Stream method all 0 and the call method not 0

Comment From: ThomasVitale

Could you share some information about your setup? What version of Spring AI are you using? And which chat model have you experienced the issue with? Thanks!

Comment From: sjh021356

M2 version

openai: api-key: base-url: chat: enabled: true options: stream-usage: true

Comment From: habuma

At least for OpenAI and Ollama (which I've had some recent streaming and usage experience)...

The underlying APIs do not even return usage data on each completion chunk in the stream. It's not even there in the response from the API at all. So, Spring AI puts in an EmptyUsage which is all zeroes. With OpenAI, you can specify a streaming option to include usage ( OpenAiChatOptions.builder(). ... .withStreamUsage(true) ) and it will include an extra chunk at the end of the stream with the usage for the entire stream.

I've observed that with Ollama, you don't get that extra chunk from the API and there's no way to ask for it. But it seems that Spring AI is (somehow...I've not had opportunity to figure out how) adding usage to the final chunk in the Flux. So it will be zero until that last chunk.

I believe that the thinking behind this behavior from the APIs (no usage until the last chunk) is because you won't really know what the usage is until the stream has completed. I suppose that there were other options like keeping a running tally on each chunk or just the count for that chunk (which, I think would always be 1 for the generation tokens). But they decided to keep it zero until the end and even then (for OpenAI) only if you ask for it.

Comment From: 1Jack2

Maybe it's the same issue as #814

Comment From: tzolov

I believe this is resolved with #1848 ? @sjh021356 can you confirm please?

Comment From: markpollack

Closing the issue, if there are any problems please get back to us @sjh021356