• Support Added support for OpenAI Text to Audio (Speech API ) with streaming support https://platform.openai.com/docs/api-reference/audio/createSpeech#:~:text=Speech%20to%20text-,Create%20speech,-

how to use it :

    @Autowired
    public OpenAiAudioSpeechClient openAiAudioSpeechClient;

       byte[] responseAsBytes = openAiAudioSpeechClient.call("Hello, world!");

````
config: 
```Java

          # OpenAI API configuration
          spring.ai.openai.api-key=your_api_key
          spring.ai.openai.base-url=https://api.openai.com

          # Speech synthesis options
          spring.ai.openai.audio.speech.options.model=tts-1
          spring.ai.openai.audio.speech.options.voice=alloy
          spring.ai.openai.audio.speech.options.response-format=mp3
          spring.ai.openai.audio.speech.options.speed=0.75

Manual options with metadata/ratelimit info and prompt style


      OpenAiAudioSpeechOptions.builder().withSpeed(0.25f).withModel(OpenAiAudioApi.TtsModel.TTS_1.value).build();
      SpeechPrompt speechPrompt = new SpeechPrompt("Hello, world!", options);
      SpeechResponse responseWithMetaData = openAiAudioSpeechClient.call(speechPrompt);
     OpenAiAudioSpeechResponseMetadata metadata = responseWithMetaData.getMetadata(); // rate limit info 

      byte[] responseAsBytes = responseWithMetaData.getResult().getOutput();

Streaming speech Audio directly from OpenAI API

OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
            .withVoice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
            .withSpeed(SPEED)
            .withResponseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
            .withModel(OpenAiAudioApi.TtsModel.TTS_1.value)
            .build();
        SpeechPrompt speechPrompt = new SpeechPrompt("Today is a wonderful day to build something people love!",
                speechOptions);
        Flux<SpeechResponse> response = openAiAudioSpeechClient.stream(speechPrompt);

Comment From: hemeda3

@tzolov seems you worked on the speech API code and merged to main, should I close this PR ?

Comment From: tzolov

Hi @hemeda3 , thanks for reaching out. I've worked on and merged the https://github.com/spring-projects/spring-ai/pull/300 PR. In the process I realised that the low-level API is scattered between the #300 and your #317 PRs. Additionally it was not completely covering the underlying OpenAI Audio API spec. So I decided to adopt an old implementation I did for my Assistant AI explorations.

Next I realised that that until we have at least two text-to-speech and speech-to-text client implementations from different AI vendors, it is premature to create a common model abstractions under the spring-ai-core/model. Later are meant to facilitate portability between vendors, but with a single implementation there is not enough data to decide what the common abstractions should look like. Therefore i've moved the Audio Transcription prompt / response and alike inside the spring-ai-openai project (under the audio/transcription package). Later when we have more audio clients we can decide how to abstract those back to the core. Finally as you can see the #300 implements client for the transcription endpoint, while your PR is adding speech generation client.

Having said this, would you be interested to re-work your PR after the refactoring i did? You will have to base your client on the OpenAiAudiApi low-level client and move the code form spring-ai-core to the spring-ai-openai .../audio/speech (e.g. next to .../audio/transcription) package? I would really appreciate your help. Do not hesitate to ask or suggest improvements.

Comment From: hemeda3

Having said this, would you be interested to re-work your PR after the refactoring i did? You will have to base your client on the OpenAiAudiApi low-level client and move the code form spring-ai-core to the spring-ai-openai .../audio/speech (e.g. next to .../audio/transcription) package? I would really appreciate your help. Do not hesitate to ask or suggest improvements.

@tzolov, thanks for the explanation 🙏. actually I was a bit confused since both APIs ( speech + transcription) share the same OpenAI audio API at a low level, but your changes have clarified things for me. I'm happy to re-work my PR based on your updates. If I have any questions, I'll reach out. Thanks for the opportunity to contribute and learn.

Comment From: hemeda3

  • Rebased the client on the OpenAiAudiApi low-level
  • Moved the code form spring-ai-core to the spring-ai-openai .../audio/speech
  • added stream support (open ai speech response can be received as stream)
  • added metadata/ratelimit using WebClient
  • added tests for speech properties + stream/call methods
  • updated issue description with thew new usage

Comment From: hemeda3

Hi @tzolov should I add the documentation to the same PR or new PR?

Comment From: tzolov

Hi @hemeda3 , thanks for asking. Sure it would be nice to add the docs too.

Comment From: hemeda3

Added speech API adoc:

  • updated nav.adoc
  • added speech.adoc

Comment From: markpollack

It took a while to get to, but it is now merged. Thanks, this was a great contribution!

Merged as 766b420f980b12700daec85c4a1fa480a6e1812e