Fixing an error in getting token usage for the spring-ai-anthropic
model when using streaming.
The token usage for the generated tokens arrives near the end of the generation with the message_delta
event, but the current implementation incorrectly assumes that the usage
property is on the delta
object. It also seems to miss the fact that only output_tokens
are present in this event.
event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence":null}, "usage": {"output_tokens": 15}}
Comment From: tzolov
Good catch @didalgolab
It seems that the message_start
event also can produce output_tokens
:
event: message_start
data: {"type": "message_start", "message": {"id": "msg_1nZdL29xx5MUA1yADyHTEsnR8uuvGzszyY", "type": "message", "role": "assistant", "content": [], "model": "claude-3-5-sonnet-20240620", "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens": 25, "output_tokens": 1}}}
Shouldn't we sum the output_tokens
from both the message_start and last message_delta events?
Or perhaps the message_start's output_tokens is an intermediate count while the message_delta's one is the total count?
Comment From: didalgolab
According to my experiments, message_delta
reports total output tokens and message_start
reports the intermediate count, so we shouldn't sum them. In fact, if you do sum them, the integration test, which compares reported token usage with the same non-streaming completion, will fail.
Comment From: tzolov
Thanks for the clarification @didalgolab
Comment From: tzolov
Thanks @didalgolab ! rebased, squashed and merged at b82210758631851a9e84a7d8e006b13c6b3b7514