OpenAI has recently introduced audio multimodality support, both for input and output.
The input audio modality support is introduced in https://github.com/spring-projects/spring-ai/issues/1560 all the way up to the Spring AI abstractions.
The output audio modality is only supported at the lower level (OpenAIApi
). Its usage is demonstrated in this integration test: https://github.com/spring-projects/spring-ai/blob/bdb66e5770836dc9dec6be40af801d9cd9e41e2a/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/api/OpenAiApiIT.java#L98-L118
It would be nice to start identifying what type of abstractions are needed in the ChatResponse
API to include audio response data.