Changed the filter template in PgVercorStore to replace JSONPath expressions with JSONB field access, using standard SQL operators such as '=' instead of '==', 'AND' instead of '&&', and 'OR' instead of '||'. This adjustment addresses compatibility issues with the 'IN' operator, which previously returned parse errors. Also updated PgVectorFilterExpressionConverter and corresponding tests to align with these changes.
- Replaced
metadata::jsonb @@ '$.key == "value"'::jsonpath
withmetadata::jsonb->>'key' = 'value'
- Fixed parsing issues with
metadata::jsonb @@ '$.key in ["value"]'::jsonpath
by using JSONB field access. - Updated logical operators IN filter expressions to standard SQL syntax.
These changes improve the clarity, execution reliability, and compatibility of filter expressions in the database.
Comment From: tzolov
Thank you @dperezcabrera, looks like a nice improvement!
Could you please pull rebase to the upstream main (e.g. git pull -r upstream main
), resolve any eventual conflicts and (force) merge to you PR?
Comment From: dperezcabrera
Thank you. I've rebased and merged the changes as requested!
Comment From: tzolov
@dperezcabrera have you run the integration tests for the pgvector and for the autoconfiguraiton?
the ./mvnw clean install -pl vector-stores/spring-ai-pgvector-store -Pintegration-tests
is failing form me
Comment From: dperezcabrera
I have executed them; however, since I don't have an API key for OpenAI, this has been the result: run: 4, Failures: 0, Errors: 0, Skipped: 4.
Can you please send me the error logs by email?
Comment From: tzolov
@dperezcabrera for example
- searchWithFilters: COSINE_DISTANCE
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding [=] ? AS distance FROM vector_store WHERE embedding [=] ? [ ? AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020) ORDER BY distance LIMIT ? ]
```
* searchWithFilters: EUCLIDEAN_DISTANCE
```
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding [-] ? AS distance FROM vector_store WHERE embedding [-] ? [ ? AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020) ORDER BY distance LIMIT ? ]
```
* searchWithFilters: NEGATIVE_INNER_PRODUCT
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, (1 + (embedding [#] ?)) AS distance FROM vector_store WHERE (1 + (embedding [#] ?)) [ ? AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020) ORDER BY distance LIMIT ? ]
```
Comment From: dperezcabrera
I see another problem, metadata::jsonb->>'year'
must be (metadata->>'year')::INTEGER
In protected void doKey(Key key, StringBuilder context) {
I can't do this change.
Comment From: markpollack
Hi, this is a great contribution to improve the speed of pgvector, you mention In protected void doKey(Key key, StringBuilder context) { I can't do this change.
what do you mean?
I can't easily share access keys, I would recommend getting an openai subscription, the cost is minimal.
Comment From: dperezcabrera
Here: (https://github.com/dperezcabrera/spring-ai/blob/bebbfceab6e22a6bdeda3c3e168aceaf52994abd/spring-ai-core/src/main/java/org/springframework/ai/vectorstore/filter/converter/PgVectorFilterExpressionConverter.java#L64C1-L76C1)
I need to know the type of the key. If fieldname is an integer, the method must use this template (metadata->>'fieldname')::INTEGER. If fieldname is a string, then: metadata::jsonb->>'fieldname'. If fieldname is a date, (metadata->>'fieldname')::DATE. This ensures the key is correctly interpreted in the database query. This will require a significant change in the common classes and may have side effects on other implementations, which need to be carefully considered before proposing it.
Now I have and access key, before I only had API keys for Microsoft's Azure, and they didn't work with the integrated tests.
Comment From: markpollack
bumping to RC1, apologies
Comment From: dperezcabrera
I think it is best to close this pull request, and address this functionality in the future. If you agree, close it.
Comment From: tzolov
Thank you for the effort @dperezcabrera. We can re-open it in the future if you have time to work on it.
Causing due to some unsolved issues (check the log above).