- Support Added support for OpenAI Text to Audio (Speech API ) with streaming support https://platform.openai.com/docs/api-reference/audio/createSpeech#:~:text=Speech%20to%20text-,Create%20speech,-
how to use it :
@Autowired
public OpenAiAudioSpeechClient openAiAudioSpeechClient;
byte[] responseAsBytes = openAiAudioSpeechClient.call("Hello, world!");
````
config:
```Java
# OpenAI API configuration
spring.ai.openai.api-key=your_api_key
spring.ai.openai.base-url=https://api.openai.com
# Speech synthesis options
spring.ai.openai.audio.speech.options.model=tts-1
spring.ai.openai.audio.speech.options.voice=alloy
spring.ai.openai.audio.speech.options.response-format=mp3
spring.ai.openai.audio.speech.options.speed=0.75
Manual options with metadata/ratelimit info and prompt style
OpenAiAudioSpeechOptions.builder().withSpeed(0.25f).withModel(OpenAiAudioApi.TtsModel.TTS_1.value).build();
SpeechPrompt speechPrompt = new SpeechPrompt("Hello, world!", options);
SpeechResponse responseWithMetaData = openAiAudioSpeechClient.call(speechPrompt);
OpenAiAudioSpeechResponseMetadata metadata = responseWithMetaData.getMetadata(); // rate limit info
byte[] responseAsBytes = responseWithMetaData.getResult().getOutput();
Streaming speech Audio directly from OpenAI API
OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
.withVoice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
.withSpeed(SPEED)
.withResponseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
.withModel(OpenAiAudioApi.TtsModel.TTS_1.value)
.build();
SpeechPrompt speechPrompt = new SpeechPrompt("Today is a wonderful day to build something people love!",
speechOptions);
Flux<SpeechResponse> response = openAiAudioSpeechClient.stream(speechPrompt);
Comment From: hemeda3
@tzolov seems you worked on the speech API code and merged to main, should I close this PR ?
Comment From: tzolov
Hi @hemeda3 , thanks for reaching out. I've worked on and merged the https://github.com/spring-projects/spring-ai/pull/300 PR. In the process I realised that the low-level API is scattered between the #300 and your #317 PRs. Additionally it was not completely covering the underlying OpenAI Audio API spec. So I decided to adopt an old implementation I did for my Assistant AI explorations.
Next I realised that that until we have at least two text-to-speech and speech-to-text client implementations from different AI vendors, it is premature to create a common model abstractions under the spring-ai-core/model. Later are meant to facilitate portability between vendors, but with a single implementation there is not enough data to decide what the common abstractions should look like. Therefore i've moved the Audio Transcription prompt / response and alike inside the spring-ai-openai project (under the audio/transcription package). Later when we have more audio clients we can decide how to abstract those back to the core. Finally as you can see the #300 implements client for the transcription endpoint, while your PR is adding speech generation client.
Having said this, would you be interested to re-work your PR after the refactoring i did? You will have to base your client on the OpenAiAudiApi low-level client and move the code form spring-ai-core to the spring-ai-openai .../audio/speech (e.g. next to .../audio/transcription) package? I would really appreciate your help. Do not hesitate to ask or suggest improvements.
Comment From: hemeda3
Having said this, would you be interested to re-work your PR after the refactoring i did? You will have to base your client on the OpenAiAudiApi low-level client and move the code form spring-ai-core to the spring-ai-openai .../audio/speech (e.g. next to .../audio/transcription) package? I would really appreciate your help. Do not hesitate to ask or suggest improvements.
@tzolov, thanks for the explanation 🙏. actually I was a bit confused since both APIs ( speech + transcription) share the same OpenAI audio API at a low level, but your changes have clarified things for me. I'm happy to re-work my PR based on your updates. If I have any questions, I'll reach out. Thanks for the opportunity to contribute and learn.
Comment From: hemeda3
- Rebased the client on the OpenAiAudiApi low-level
- Moved the code form spring-ai-core to the spring-ai-openai .../audio/speech
- added stream support (open ai speech response can be received as stream)
- added metadata/ratelimit using WebClient
- added tests for speech properties + stream/call methods
- updated issue description with thew new usage
Comment From: hemeda3
Hi @tzolov should I add the documentation to the same PR or new PR?
Comment From: tzolov
Hi @hemeda3 , thanks for asking. Sure it would be nice to add the docs too.
Comment From: hemeda3
Added speech API adoc:
- updated nav.adoc
- added speech.adoc
Comment From: markpollack
It took a while to get to, but it is now merged. Thanks, this was a great contribution!
Merged as 766b420f980b12700daec85c4a1fa480a6e1812e