Even though only OpenAI and Azure OpenAI current offer transcription support, they are each represented by distinct transcription model types: OpenAiAudioTranscriptionModel and AzureOpenAiAudioTranscriptionModel. There is no common interface between these two, which means that you have to explicitly inject the one you are working with rather than inject a common interface.
They both implement Model<AudioTranscriptionPrompt, AudioTranscriptionResponse>, but it would be more handy to have a common interface like this that they each could implement:
public interface TranscriptionModel extends Model<AudioTranscriptionPrompt, AudioTranscriptionResponse> {
AudioTranscriptionResponse call(AudioTranscriptionPrompt transcriptionPrompt);
}
Similar to ChatModel, there may be opportunity for some additional default convenience methods, as well. Perhaps one that accepts a Resource and returns a String and another that accepts a Resource and AudioTranscriptionOptions and produces a String.
Comment From: habuma
Note that I don't have the time right now to work on this and submit a PR, but don't mind doing so when I get a moment. But if someone wants to beat me to it, then by all means go for it.
Comment From: mudabirhussain
I will do it....
Comment From: markpollack
We won't have time to do this before GA, so postponing until after.