Motivation
Right now, Azure (Microsoft) is the biggest shareholder of OpenAI, and it puts a lot of effort into providing OpenAI services. I think it is reasonable to have as many supported models as possible.
Description
[!CAUTION] This PR introduces breaking changes. Classes from the
org.springframework.ai.openai.metadata.audio.transcription
package have been moved to theorg.springframework.ai.audio.transcription
package.
The repeated part was moved to the core package.
The AzureOpenAiAudioTranscriptionModel
has been added to the auto-configuration with the following properties.
The spring.ai.azure.openai.audio.transcription
prefix was introduced for properties. It also introduces options properties which cover all of them (see: AzureOpenAiAudioTranscriptionOptions
).
The Azure SDK has been bumped to version 1.0.0-beta.9
. This upgrade introduced a change in the JSON field - the prompt_annotations
field was changed to prompt_filter_results
(ref: https://github.com/Azure/azure-rest-api-specs/pull/25880).
Another significant change in the SDK was the replacement of jackson-databind
with azure-json
(ref: https://github.com/Azure/azure-sdk-for-java/pull/39825). As a result, in the AzureOpenAiEmbeddingModel
class, we cannot use ModelOptionsUtils
as it is based on the @JsonProperty
annotation. I replaced this feature with manual assignment.
TODO
There are a couple of things to do. I don't want to do them right now for the sake of acceptance. But for sure, it should be done after the PR is merged.
- [x] Add docs in the Audio Model API ➡ Transcription API ➡ Azure OpenAI section
- [x] Add
AudioTranscriptionMetadata
for both OpenAI and Azure OpenAI with information from theVERBOSE_JSON
response format (@tzolov what do you think?)
Comment From: tzolov
Hi @piotrooo , Good job so far. Moving the transcription package to the core is good step as well.
FYI, the other day I've tried to migrate to the 1.0.0-beta.9. Fixed some compilation and other issues but got stuck with strange test failures (structured output stopped working) so left it for when i have time. If interested here is the branch: https://github.com/tzolov/spring-ai/tree/update_azure_openai_client_version
Comment From: piotrooo
Sure @tzolov 👍
I'll look at this on Monday. Could you give me a hint about the falling tests?
Comment From: tzolov
@piotrooo, seem like the structured output converters (e.g. AzureOpenAiChatModelIT's listOutputConverter, mapOutputConverter, beanOutputConverter) are not working after the upgrade.
Comment From: piotrooo
@tzolov, it seems to be a bug in the newly introduced JSON serializer in the Azure SDK. Here is a referenced PR with a fix.
It looks like a problem with content serialization and deserialization in Azure. I made a test:
-
At the Azure OpenAI Chat playground:
-
Using cURL:
19:03 $ curl "https://<url>.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-05-01-preview" -H "Content-Type: application/json" -H "api-key: <key>" -d '{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Generate the filmography of 5 movies for Tom Hanks.\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n \"type\" : \"object\",\n \"properties\" : {\n \"actor\" : {\n \"type\" : \"string\"\n },\n \"movies\" : {\n \"type\" : \"array\",\n \"items\" : {\n \"type\" : \"string\"\n }\n }\n }\n}```\n\n"
}
]
}
],
"max_tokens": 200,
"stream": false,
"model": "gpt-4o"
}' | jq .
Response:
{
"choices": [
{
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
},
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "{\n \"actor\": \"Tom Hanks\",\n \"movies\": [\n \"Forrest Gump\",\n \"Saving Private Ryan\",\n \"Cast Away\",\n \"The Green Mile\",\n \"Toy Story\"\n ]\n}",
"role": "assistant"
}
}
],
"created": 1719248590,
"id": "chatcmpl-9dhNuNcvUDiOpEgc0fuObQJPzDoMx",
"model": "gpt-4o-2024-05-13",
"object": "chat.completion",
"prompt_filter_results": [
{
"prompt_index": 0,
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
],
"system_fingerprint": "fp_abc28019ad",
"usage": {
"completion_tokens": 47,
"prompt_tokens": 169,
"total_tokens": 216
}
}
Everything works correctly. Unfortunately, we need to wait for the fix.
Comment From: piotrooo
@tzolov, it seems thet everything works as expected. I've checked it in our Azure subscription on our internal models. Could you confirm it is working as expected?
Comment From: tzolov
@piotrooo thanks for the update. It looks like 1.0.0.beta10 has another issue with JSON handing during streaming. I'm considering rolling it back to beta8. Will this affect your PR?
Comment From: piotrooo
I'm considering rolling it back to beta8. Will this affect your PR?
Yes, for sure. beta8
doesn't have support for the AudioTranscriptionTimestampGranularity
functionality as far as I remember.
Comment From: tzolov
Hi @piotrooo , Can you please add the related documentation? I guess it should be under the https://docs.spring.io/spring-ai/reference/api/audio/transcriptions.html section.
The Antora docs source is under: https://github.com/spring-projects/spring-ai/tree/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions and you can use the https://github.com/spring-projects/spring-ai/blob/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions/openai-transcriptions.adoc as an example.
Finally you can add the page to the catalog in front of the openai transcription: https://github.com/spring-projects/spring-ai/blob/4aacab020b2fa6932ac3a69433a60c282eec617b/spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc?plain=1#L63
Comment From: piotrooo
@tzolov I've just added docs.
Comment From: tzolov
Thanks @piotrooo , After rebasing I can see compilation error likely due to the recent improvements in the response metadata : https://github.com/spring-projects/spring-ai/pull/1070 Let me know if you have time to resolve. Hopefully it will be a trivial change.
Comment From: piotrooo
@tzolov I think I did everything. The code compiles for me.
Comment From: tzolov
Thank you @piotrooo
I could not find integration tests for the azure transcription service, so i've tired to add such in my merge branch: https://github.com/tzolov/spring-ai/tree/gh-902-pr * https://github.com/tzolov/spring-ai/tree/gh-902-pr/models/spring-ai-azure-openai/src/test/java/org/springframework/ai/azure/openai/audio * https://github.com/tzolov/spring-ai/blob/1ae9a7258d1b32f01f6eeedb4ffa5afba95001e7/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/azure/AzureOpenAiAutoConfigurationIT.java#L167
But they both ITs fail. Could you have a look please
Comment From: piotrooo
@tzolov I've added the missing test you mentioned. I've also tested it on our internal models.
AzureOpenAiAutoConfigurationIT.transcribe()
AzureOpenAiAudioTranscriptionModelIT
Comment From: tzolov
Thank you @piotrooo ! Great stuff Thanks for contributing the azure transcription and for improving the API consistency! Looking forward for the next contribution ;)
Comment From: tzolov
Small test adjustments, rebased, squashed and merged at 0e97f9c579096ae65e0e3a2e12490aa78a16966f