Bug description When I create a RAG Application with SimilaritySearch, the search returns similar documents when using Azure OpenAI, but always returns zero documents with Ollama. The issue occurs specifically when I use the withSimilarityThreshold parameter with Ollama.
Environment last version of spring-ai BOM ollama with llama 3.2 Azure OpenAI PGVector
Steps to reproduce When I use withSimilarityThreshold with Ollama, I always have 0 Document in my similarDocuments (it's ok with Azure OpenAI)
var similarity = 0.8;
var topk =20;
var searchRequest = SearchRequest.query(question.question())
.withTopK(topk)
.withSimilarityThreshold(similarity);
List<Document> similarDocuments = vectorStore.similaritySearch(searchRequest);
Expected behavior The search should return similar results between Azure OpenAI and Ollama. However, Ollama consistently returns zero similar documents.
Observed Behavior
Azure OpenAI: Returns a list of documents that meet the similarity threshold. Ollama: Returns zero documents, regardless of the similarity threshold set.
Additional Information
Configuration Consistency: The vectorStore configuration is identical for both platforms. Error Logs: No explicit error messages are thrown; however, the empty response from Ollama does not align with the expected output. [spring-ai-bootstraping] [nio-8080-exec-1] c.c.s.s.controller.AskRagController : Found 0 similar documents
Comment From: asaikali
What embedding model are you using with ollama?
Comment From: cjullien
I tested with the following models
spring.ai.ollama.embedding.options.model=nomic-embed-text
spring.ai.ollama.embedding.options.model=mxbai-embed-large
spring.ai.ollama.embedding.options.model=chroma/all-minilm-l6-v2-f32
spring.ai.ollama.embedding.options.model=hellord/e5-mistral-7b-instruct:Q4_0
Comment From: 77fill
look here: spring-ai/vector-stores/spring-ai-chroma-store/src/main/java/org/springframework/ai/vectorstore /ChromaVectorStore.java method: doSimilaritySearch version: 1.0.0-M4
I don't understand the condition(1 - distance) >= request.getSimilarityThreshold()
Isn't the similarity threshold between 0 and 1? What about (1-distance)
? Isn't it normally negative? Is that perhaps the reason why the document list becomes empty?
Comment From: markpollack
cosine similarity measures the cosine of the angle between two vectors, indicating their directional alignment. This value ranges from -1 (exactly opposite) to 1 (exactly the same), with 0 signifying orthogonality.
Conversely, cosine distance quantifies the dissimilarity between vectors and is defined as 1 minus the cosine similarity. Therefore, cosine distance ranges from 0 (identical vectors) to 2 (diametrically opposed vectors).
pgvector returns the cosine distance, not the cosine similarity. To retrieve the cosine similarity from the result, you can subtract the returned cosine distance from 1.
note that i think we have a mistake when using other metric types, e.g. euclidian distance, in the current implementation.
https://omiid.me/notebook/32/pgvector-similarity-search-distance-functions
neverthelss, we should investigate to clear this up as i think our integration tests are not covering it.