Spring-ai Investigate Documents Used for VectorDB testing

I just noticed when writing the README for the Neo4j store that the sentence in the PgVector store is a little bit vague. Yes, it's true that the document will be there but it is either to expect that it is the first in the collection or we should set the "top-k amount" to 1 in the example snippet, or?

https://github.com/spring-projects-experimental/spring-ai/blob/0a584f0dc2483ef9ad0741c8598998e489b64726/vector-stores/spring-ai-pgvector-store/README.md?plain=1#L141-L145

A little bit tweaked in the Neo4j store README: https://github.com/spring-projects-experimental/spring-ai/blob/0a584f0dc2483ef9ad0741c8598998e489b64726/vector-stores/spring-ai-neo4j-store/README.md?plain=1#L114-L118

Comment From: markpollack

I think the testing of the vector stores is rather anemic , we sort of trust they implemented it right, but with the same data we are using, it is a bit vague indeed. I'll rename this to be 'Investigate Documents Used for VectorDB testing'. Perhaps some of the methods you used to test the neo4j vector store itself would be more generally applicable here? @meistermeier ?

Comment From: meistermeier

Thanks for your input, Mark. I think you are interpreting more into the issue than I intended to mention. My point was only that a List<Document> contains the document. Just a little bit wording.

Given your input, I am thinking about if a AbstractVectorStoreTestBase containing the test dataset, invoking the similaritySearch methods, and asserting the result would make sense. Some kind of "cheap" TCK. From my perspective an enforced >90% hit ("Spring") should always be first of the list.

Comment From: markpollack

I see your point, contains is better than assuming it would be first, but for now, I think we can spend time more effectevly on other areas. Should issues come up with our vector db testing approach for new vector stores, we can revisit. Thanks for the comment.