Even though only OpenAI and Azure OpenAI current offer transcription support, they are each represented by distinct transcription model types: OpenAiAudioTranscriptionModel and AzureOpenAiAudioTranscriptionModel. There is no common interface between these two, which means that you have to explicitly inject the one you are working with rather than inject a common interface.

They both implement Model<AudioTranscriptionPrompt, AudioTranscriptionResponse>, but it would be more handy to have a common interface like this that they each could implement:

public interface TranscriptionModel extends Model<AudioTranscriptionPrompt, AudioTranscriptionResponse> {

  AudioTranscriptionResponse call(AudioTranscriptionPrompt transcriptionPrompt);

}

Similar to ChatModel, there may be opportunity for some additional default convenience methods, as well. Perhaps one that accepts a Resource and returns a String and another that accepts a Resource and AudioTranscriptionOptions and produces a String.

Comment From: habuma

Note that I don't have the time right now to work on this and submit a PR, but don't mind doing so when I get a moment. But if someone wants to beat me to it, then by all means go for it.

Comment From: mudabirhussain

I will do it....

Comment From: markpollack

We won't have time to do this before GA, so postponing until after.