Currently, vector store automatically calls the embedding client to generate the document embedding without checking whether the document already had an embedding.

In this PR, I first check if the document doesn't already have an embedding before calling the client to generate an embedding. This prevents too many calls to generate an embedding.

  • Tests are green for impacted vector stores

Comment From: tzolov

If not mistaken this is the same or related to https://github.com/spring-projects/spring-ai/pull/413 ?

But this change comes with some risks. For example, it is not clear when one would have to invalidate the pre-computed embedding (e.g. the index). Likely when Also I'm not sure how useful this feature would be. What is the use case where you will use repeatedly the same Documents (with pre-computed embeddings) for searching? Or what are the reasons you might what to re-add a document that has precomputed embedding?

Maybe I'm missing some interesting use cases?

Right now we do not allow the Vector Store to use other embeddings but those computed by the embedding-model registered with the VectorStore. Using the embedding field would allow one to pre-compute the embeddings externally using different embedding-model and then the VectorStore will store the document with the externally computed embedding. But I'm not sure if this is a real or needed use case, nor if this is the right approach to support it.

Comment From: tzolov

If the pre-computed embeddings are not applicable/useful for real use cases, IMO, we should remove the embedding field from the Document class.

Comment From: markpollack

See https://github.com/spring-projects/spring-ai/issues/1781

I think this was a design mistake to begin with, we shouldn't be caching/storing the embedding in the document in the first place.

Comment From: markpollack

we have removed embedding from Document, it isn't needed. I was a copy of the design from langchain way back when I started the project. This simplifes the flow of getting data into the vector database, no confusion