Expected Behavior
Ollama generate API allows an additional field which is not present on the model, it's the "images" field, which has to be an array of base 64 encoded images. With that field we can ask models like "llava" about those images.
E.g. take this request for generate endpoint, with the b64 contents of just a capture from a given text:
Request:
curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data '{
"model": "llava",
"prompt": "What does this image say?",
"stream": false,
"images":["iVBORw..."]
Gives us the following response:
{
"model": "llava",
"created_at": "2024-03-10T08:15:19.437032Z",
"response": " The image shows a text that says:\n\n\"This is an example text\" ",
...
}
So, the expected behaviour is that for the chat client, I should be able to send an image resource as an attachment for any prompt.
Or, I should be able to extend current implementation to support needed functionallity not yet supported.
Current Behavior
I cannot ask Ollama about the contents of any image, as the "images" field is not defined in the payload record org.springframework.ai.ollama.api.OllamaApi.GenerateRequest.
Context
Local setup of Ollama with "llava" model, trying to get explanations, descriptions or insights about an image. Need to send both the text and the image and found that Ollama generate endpoint options are not fully supported, in particular the "images" field.
Comment From: mikrethor
I am takling that issue at the moment. Is it ok if if I modify the GenerateRequest to include images?
@JsonInclude(Include.NON_NULL)
public record GenerateRequest(
@JsonProperty("model") String model,
@JsonProperty("prompt") String prompt,
@JsonProperty("images") List<String> images,
@JsonProperty("format") String format,
@JsonProperty("options") Map<String, Object> options,
@JsonProperty("system") String system,
@JsonProperty("template") String template,
@JsonProperty("context") List<Integer> context,
@JsonProperty("stream") Boolean stream,
@JsonProperty("raw") Boolean raw)
Comment From: tzolov
@AlbertoPolo , @mikrethor just submitted a PR for it: https://github.com/spring-projects/spring-ai/pull/479 It is successfully tested with LLaVa.
Here is screenshot of the updated docs. The text is actual LLaVa response: