Small implementation changes to the function calling mechanism. Right now, in case there is a “tool call” request from a model, we directly do the recursive call. What I suggest in the PR is actually building the final ChatResponse object first. Then, we can return it directly if there’s no tool call. If there is, we trigger the recursive call.

With this change, we can correctly handle the observation scopes, making sure each observation contains the correct request/response pair in the context.

We can also generalize a bit more the function calling logic. Working directly with a ChatResponse object, I managed to remove the need for making the abstract class generic, moving even more logic up and reducing the amount of code that is provider-specific.

One more thing I changed is wrapping only the call to OpenAI within RetryTemplate. Right now, it wraps the entire implementation, catching many different exceptions not related to the HTTP call, and retrying even though it shouldn’t (impacting both runtime code but also tests). I would suggest progressively making this change for all implementations. Besides reaching a more predictable behaviour in terms of retries, it helps making the code more readable without unnecessary nesting, especially after delivering the observability changes (which would add one more nested level).

Comment From: tzolov

Thanks @ThomasVitale Streamline the Generation constructor implementation, rebase, squash and merge at: 40d8671f3e4ec12b26f37a017e706da3a16b7ab5

Comment From: tzolov

Now we have to repeat this exercise for the remaining Function Calling models :)

Comment From: ThomasVitale

@tzolov great, thank you! Tomorrow, I'll update the observability PR and work on extending this functional calling strategy to the other models.