Holding off squashing/documentation further until I get some eyes on it.
This implements a VectorStore
on MongoDB Atlas. This does NOT work with MongoDB hosted outside of Atlas as using vector search requires creating a search index through Atlas.
More on how that works under the hood here: https://www.mongodb.com/docs/atlas/atlas-search/atlas-search-overview/
As for configuration, I figured we could provide some defaults, but ideally allow users to set the following themselves as much of it is reliant on whatever they have setup in Atlas.
path: The field you are using for your index in the Atlas Vector Search Index
vector_index: The name of the vector search index
vector_collection_name : The name of the collection your index was created on
num_candidates : Number of nearest neighbors to use during the search. Value must be less than or equal to (<=) 10000. You can't specify a number less than the number of documents to return (limit).
metadataFieldsToFilter: The metadata fields you want to be able to filter with.
We query the database using an aggregation search and preform a post-filter in the pipeline to filter out anything below the threshold value. https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#mongodb-pipeline-pipe.-vectorSearch
One thing I have noticed (and why I have a terrible sleep(5000) in the test for now....) is that the indexing does not happen instantly.
Comment From: tzolov
This is great! thanks for contributing @Kirbstomper !
Comment From: Kirbstomper
Added some more documentation and a builder for configuration
I'm thinking of renaming the module to vector-stores/mongodb-atlas
to make it more obvious that this is not just for run of the mill mongo db, it will also differentiate it from if we ever create a VectorStore for something like Azure Cosmo DB for Mongo as that has some vector storage support too.
Comment From: tzolov
To be consistent with the other store names you should use the spring-ai-
prefix. E.g. vector-stores/spring-ai-mongodb-atlas
.
I've noticed that MongoDB Atlas provides filtering support. Perhaps we can map our metadata filtering to it. We have done it for most of the other stores. If you have time (and interest) you can investigate this. But it is not critical. We can leave it of later improvements. I'm busy with other stuff until our 0.8.1 milestone release (coming week). But will try to review your contribution for next milestone (1.0.0-M1).
Comment From: Kirbstomper
@tzolov Implemented filtering, and creation of the search index if it doesn't exist. The search index will be created using the configured collection name, metadataFieldsToFilter, pathName, and vectorIndexName. If the search index does exist, then from what I can tell the operation just doesn't do anything.
Ideally I want to be able to check if this search index already exist and then update the index instead, but the updateSearchIndex
operation doesn't seem to work very well with vector search index definitions.
Comment From: tzolov
Hey @Kirbstomper , thanks for your contribution. Good stuff.
I've reviewed and made some small fixes before merging it. Here are the left overs if you are interested to continue with them.
- https://github.com/spring-projects/spring-ai/issues/455
- https://github.com/spring-projects/spring-ai/issues/456
Comment From: tzolov
Rebased, squashed and merged at 5f0123cc9228b1befff8bbf5fd76411f7683c14e
Follow up actions: https://github.com/spring-projects/spring-ai/issues/455 https://github.com/spring-projects/spring-ai/issues/456