This PR provides a more performant search function for the Elasticsearch vector store and removes the bean autoconfiguration.

In depth

Autoconfiguration

The current implementation of afterPropertiesSet() automatically creates a new index with a set of properties that only works with OpenAI, or any other model that works with vectors with a dimension of 1536; users adopting other models would currently have to manually delete the index. By default, Elasticsearch automatically creates the correct index settings when it receives the first PUT request for vectors, so in our opinion there's no need for such autoconfiguration.

Search

The function used now is script_score, which can be slow for large data samples since it does a brute force comparison with all vectors using the similarity function. This PR replaces it with the approximate knn search, more performant because it only scans the closest neighbours. The similarity functions available for knn can be easily configured in the index mapping by setting the correct name.

We would also like to contribute to the documentation, should that be done in a different PR?

Thank you @JM-Lab for the original implementation, would you like to review these changes?

(Disclosure: I work for Elastic)

Comment From: JM-Lab

Hi @l-trotta,

Thanks for the update. I liked how script_score was quick to implement due to available examples, but I'm pleased we're switching to KNN for better performance.

I find the automatic creation of index mappings convenient; however, for my projects, I typically need to define these mappings in advance. Could we consider adding an option to provide index mappings directly in the constructor?

Additionally, could you check if the code at ElasticsearchAiSearchFilterExpressionConverter.java needs any improvements?

I'm also looking forward to the official documentation for the Elasticsearch vector store of the Spring AI project, which I think @tzolov will address.

Comment From: l-trotta

Hi @JM-Lab,

Since the index mapping can be added with createIndexMapping(), and in most cases the autoconfiguration is enough, we chose not to add the mapping to the constructor to make it easier to configure it at the start.

The filter converter looked fine to me! Thanks again for your work.

Comment From: tzolov

@l-trotta , thank you for the improvements. Could you please rebase and update your PR after the https://github.com/spring-projects/spring-ai/pull/633 merge.

Comment From: l-trotta

Updated! I'd like to explain why I removed the dense-vector-indexing property: quoting the documentation,

If true, you can search this field using the kNN search API. Defaults to true.

And since this implementation uses kNN search, setting this to false would make the data unsearchable.

Comment From: ezimuel

@tzolov what's the status of this PR? Thanks.

Comment From: tzolov

@l-trotta thanks for your contribution and thank you for your patience.

While reviewing the PR i've noticed that you have removed the capability to override the similarityFunction at runtime? Is there any particular reason to reduce this flexibility? Either way it renders this ElasticsearchVectorStoreAutoConfigurationIT almost useless? If there are not strong reasons to have this immutable perhaps we should re-enable the withSimilarityFunction setter?

Comment From: l-trotta

@tzolov Thank you for the review, I'll explain my changes:

  • Removal of the withSimilarityFunction setter: in elasticsearch the similarity function is defined as a mapping property of the index and cannot be changed, trying to update the property in the mapping will result in the following error: { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": """Mapper for [embedding] conflicts with existing mapper: Cannot update parameter [similarity] from [dot_product] to [cosine]""" } ], "type": "illegal_argument_exception", "reason": """Mapper for [embedding] conflicts with existing mapper: Cannot update parameter [similarity] from [dot_product] to [cosine]""" }, "status": 400 } so the only way to change mapping is to create another index, and in this case a new vector store instance.
  • Having elasticsearch-java explicitly defined in autoconfiguration: while testing the autoconfiguration I noticed it was importing an older version of the client (8.10.4), which breaks the current implementation of the vector store since the knn query was recently fixed with version 8.13.3, so I wanted to make sure that this would be the version used, but maybe it's an issue with my local environment? If everything works without that it can be removed.

I didn't notice ElasticsearchVectorStoreAutoConfigurationIT being broken, sorry for that, I updated it.

Comment From: tzolov

Thanks for the clarification @l-trotta !

It seems latest Spring Boot 3.3.x (the one we use for our auto-configs) is already at Elasticsearch Client 8.13.4. So I guess we are fine without the forced dependencies?

Perhaps you can mention (in a separate PR) this requirement in the docs. E.g. if someone is using pre 3.3. Boot they have to add the elastic search 8.13.4 dependency to their poms?

Comment From: tzolov

rebased, squashed and merged at 78f3797f8d2c06cb429e6705f7b5eade6259dcf3

Comment From: tzolov

@l-trotta thank you for the great work! Just merged the PR. Please check if the dependencies work as expected and consider adding a NOTE in the documentation as suggested above.

Comment From: l-trotta

@tzolov thank you! I'll test with an older boot version, if there could be version problems I'll open another PR for the documentation.

Comment From: l-trotta

is there any estimation as for when these changes will be released?

Comment From: ezimuel

@tzolov do you know when this PR will be public released? We would like to prepare some materials to promote this integration for Elasticsearch. Thanks!