I wanted to search for the document by id before adding it to the vector db (how do you do it?) But it seems the id changes after each embedding. In this way it is not possible to check whether the document already exists before adding it back to the vector db.. Furthermore, it seems to me that the document store is not persistent on DB (azure), why? Thanks
Comment From: markpollack
I believe @tzolov has a branch where the ID is computed upfront based on content and metadata instead of a random GUID. I'll discuss adding that helper code to the 0.8.0 release.
In the meantime, you can use the the constructor
public Document(String id, String content, Map<String, Object> metadata) {
and assign the id
yourself based on your own definition of uniqueness.
That would likely solve the problem for you, though I wanted to do a bit more research into this. For the next 0.9.0 release will will focus more on solving realistic RAG type situation. Generally speaking vector db metadata managment is an important topic, so bear with us.
If you are noticing an issue with AzureOpenAiEmbeddingClient
please provide a description in a separate issue.
Comment From: nurlicht
@tzolov, This issue seems to be related to #113 (with the PR #272).
Comment From: williamspindox
Thanks. FYI I found an interesting real-life use case about that on this project: https://github.com/Angular2Guy/AIDocumentLibraryChat
Comment From: markpollack
Update what was merged here to crate a new message digest on demand to avoid any potential thread safety issues.
Comment From: markpollack
You can now use the JdkSha256HexIdGenerator to check the document id before inserting. That should help in your case. You also raised a separate issue
the document store is not persistent on DB (azure), why?
Can you run the AzureVectorStoreIT.java
integration test please.
I will raise another issue for the azure vector store issue and in the meantime, close this issue.