Spring-ai Introduce AzureOpenAI transcription support

Motivation

Right now, Azure (Microsoft) is the biggest shareholder of OpenAI, and it puts a lot of effort into providing OpenAI services. I think it is reasonable to have as many supported models as possible.

Description

[!CAUTION] This PR introduces breaking changes. Classes from the org.springframework.ai.openai.metadata.audio.transcription package have been moved to the org.springframework.ai.audio.transcription package.

The repeated part was moved to the core package.

The AzureOpenAiAudioTranscriptionModel has been added to the auto-configuration with the following properties. The spring.ai.azure.openai.audio.transcription prefix was introduced for properties. It also introduces options properties which cover all of them (see: AzureOpenAiAudioTranscriptionOptions).

The Azure SDK has been bumped to version 1.0.0-beta.9. This upgrade introduced a change in the JSON field - the prompt_annotations field was changed to prompt_filter_results (ref: https://github.com/Azure/azure-rest-api-specs/pull/25880). Another significant change in the SDK was the replacement of jackson-databind with azure-json (ref: https://github.com/Azure/azure-sdk-for-java/pull/39825). As a result, in the AzureOpenAiEmbeddingModel class, we cannot use ModelOptionsUtils as it is based on the @JsonProperty annotation. I replaced this feature with manual assignment.

TODO

There are a couple of things to do. I don't want to do them right now for the sake of acceptance. But for sure, it should be done after the PR is merged.

[x] Add docs in the Audio Model API ➡ Transcription API ➡ Azure OpenAI section
[x] Add AudioTranscriptionMetadata for both OpenAI and Azure OpenAI with information from the VERBOSE_JSON response format (@tzolov what do you think?)

Comment From: tzolov

Hi @piotrooo , Good job so far. Moving the transcription package to the core is good step as well.

FYI, the other day I've tried to migrate to the 1.0.0-beta.9. Fixed some compilation and other issues but got stuck with strange test failures (structured output stopped working) so left it for when i have time. If interested here is the branch: https://github.com/tzolov/spring-ai/tree/update_azure_openai_client_version

Comment From: piotrooo

Sure @tzolov 👍

I'll look at this on Monday. Could you give me a hint about the falling tests?

Comment From: tzolov

@piotrooo, seem like the structured output converters (e.g. AzureOpenAiChatModelIT's listOutputConverter, mapOutputConverter, beanOutputConverter) are not working after the upgrade.

Comment From: piotrooo

@tzolov, it seems to be a bug in the newly introduced JSON serializer in the Azure SDK. Here is a referenced PR with a fix.

It looks like a problem with content serialization and deserialization in Azure. I made a test:

At the Azure OpenAI Chat playground:
Using cURL:

19:03 $ curl "https://<url>.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-05-01-preview"   -H "Content-Type: application/json"   -H "api-key: <key>"   -d '{ 
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Generate the filmography of 5 movies for Tom Hanks.\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"actor\" : {\n      \"type\" : \"string\"\n    },\n    \"movies\" : {\n      \"type\" : \"array\",\n      \"items\" : {\n        \"type\" : \"string\"\n      }\n    }\n  }\n}```\n\n"
        }
      ]
    }
  ],
  "max_tokens": 200,
  "stream": false,
  "model": "gpt-4o"
}'  | jq .

Response:

{
  "choices": [
    {
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "{\n  \"actor\": \"Tom Hanks\",\n  \"movies\": [\n    \"Forrest Gump\",\n    \"Saving Private Ryan\",\n    \"Cast Away\",\n    \"The Green Mile\",\n    \"Toy Story\"\n  ]\n}",
        "role": "assistant"
      }
    }
  ],
  "created": 1719248590,
  "id": "chatcmpl-9dhNuNcvUDiOpEgc0fuObQJPzDoMx",
  "model": "gpt-4o-2024-05-13",
  "object": "chat.completion",
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "system_fingerprint": "fp_abc28019ad",
  "usage": {
    "completion_tokens": 47,
    "prompt_tokens": 169,
    "total_tokens": 216
  }
}

Everything works correctly. Unfortunately, we need to wait for the fix.

Comment From: piotrooo

@tzolov, it seems thet everything works as expected. I've checked it in our Azure subscription on our internal models. Could you confirm it is working as expected?

Comment From: tzolov

@piotrooo thanks for the update. It looks like 1.0.0.beta10 has another issue with JSON handing during streaming. I'm considering rolling it back to beta8. Will this affect your PR?

Comment From: piotrooo

I'm considering rolling it back to beta8. Will this affect your PR?

Yes, for sure. beta8 doesn't have support for the AudioTranscriptionTimestampGranularity functionality as far as I remember.

Comment From: tzolov

Hi @piotrooo , Can you please add the related documentation? I guess it should be under the https://docs.spring.io/spring-ai/reference/api/audio/transcriptions.html section.

The Antora docs source is under: https://github.com/spring-projects/spring-ai/tree/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions and you can use the https://github.com/spring-projects/spring-ai/blob/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions/openai-transcriptions.adoc as an example.

Finally you can add the page to the catalog in front of the openai transcription: https://github.com/spring-projects/spring-ai/blob/4aacab020b2fa6932ac3a69433a60c282eec617b/spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc?plain=1#L63

Comment From: piotrooo

@tzolov I've just added docs.

Comment From: tzolov

Thanks @piotrooo , After rebasing I can see compilation error likely due to the recent improvements in the response metadata : https://github.com/spring-projects/spring-ai/pull/1070 Let me know if you have time to resolve. Hopefully it will be a trivial change.

Comment From: piotrooo

@tzolov I think I did everything. The code compiles for me.

Comment From: tzolov

Thank you @piotrooo

I could not find integration tests for the azure transcription service, so i've tired to add such in my merge branch: https://github.com/tzolov/spring-ai/tree/gh-902-pr * https://github.com/tzolov/spring-ai/tree/gh-902-pr/models/spring-ai-azure-openai/src/test/java/org/springframework/ai/azure/openai/audio * https://github.com/tzolov/spring-ai/blob/1ae9a7258d1b32f01f6eeedb4ffa5afba95001e7/spring-ai-spring-boot-autoconfigure/src/test/java/org/springframework/ai/autoconfigure/azure/AzureOpenAiAutoConfigurationIT.java#L167

But they both ITs fail. Could you have a look please

Comment From: piotrooo

@tzolov I've added the missing test you mentioned. I've also tested it on our internal models.

AzureOpenAiAutoConfigurationIT.transcribe()

Screenshot from 2024-07-22 08-30-19

AzureOpenAiAudioTranscriptionModelIT

Screenshot from 2024-07-22 08-29-04

Comment From: tzolov

Thank you @piotrooo ! Great stuff Thanks for contributing the azure transcription and for improving the API consistency! Looking forward for the next contribution ;)

Comment From: tzolov

Small test adjustments, rebased, squashed and merged at 0e97f9c579096ae65e0e3a2e12490aa78a16966f