Hello,
While testing the latest build for spring-ai-elasticsearch-store I found myself unable to index documents, getting the following error:
failed to parse: The [dense_vector] field [embedding] in doc [document with id '5e09ac56-e171-4075-a597-e93755fa63a1'] has a different number of dimensions [0] than defined in the mapping [1536]
after some debugging I found out that most of the documents I was trying to index had the embedding
field empty, and after some more debugging I have a suspicion that this code might be faulty:
https://github.com/spring-projects/spring-ai/blame/9f33e326b1fa7087c21b2d55d9f147db94332001/spring-ai-core/src/main/java/org/springframework/ai/embedding/TokenCountBatchingStrategy.java#L124
if (currentSize + tokenCount > maxInputTokenCount) {
batches.add(currentBatch);
currentBatch.clear();
currentSize = 0;
}
currentBatch.clear()
is also clearing the lists referenced in batches
, which gets emptied after every cycle, so only the last few documents actually get to stay in batches
, so only those documents are referenced when it's time to fill the embeddings
field.
Let me know if more details are needed, happy to help.