Fixing an error in getting token usage for the spring-ai-anthropic model when using streaming.

The token usage for the generated tokens arrives near the end of the generation with the message_delta event, but the current implementation incorrectly assumes that the usage property is on the delta object. It also seems to miss the fact that only output_tokens are present in this event.

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence":null}, "usage": {"output_tokens": 15}}

Comment From: tzolov

Good catch @didalgolab

It seems that the message_start event also can produce output_tokens:

event: message_start
data: {"type": "message_start", "message": {"id": "msg_1nZdL29xx5MUA1yADyHTEsnR8uuvGzszyY", "type": "message", "role": "assistant", "content": [], "model": "claude-3-5-sonnet-20240620", "stop_reason": null, "stop_sequence": null, "usage": {"input_tokens": 25, "output_tokens": 1}}}

Shouldn't we sum the output_tokens from both the message_start and last message_delta events? Or perhaps the message_start's output_tokens is an intermediate count while the message_delta's one is the total count?

Comment From: didalgolab

According to my experiments, message_delta reports total output tokens and message_start reports the intermediate count, so we shouldn't sum them. In fact, if you do sum them, the integration test, which compares reported token usage with the same non-streaming completion, will fail.

Comment From: tzolov

Thanks for the clarification @didalgolab

Comment From: tzolov

Thanks @didalgolab ! rebased, squashed and merged at b82210758631851a9e84a7d8e006b13c6b3b7514