The field name file_name
is not compatible with the filter expression parsing.
SearchRequest searchRequest = SearchRequest.defaults()
.withTopK(4)
.withFilterExpression(PagePdfDocumentReader.METADATA_FILE_NAME + " == 'medicaid-wa-faqs.pdf'");
where `public static final String METADATA_FILE_NAME = "file_name"
throws the exception
Caused by: org.antlr.v4.runtime.NoViableAltException: null
at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2014) ~[antlr4-runtime-4.13.1.jar:4.13.1]
at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:445) ~[antlr4-runtime-4.13.1.jar:4.13.1]
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:371) ~[antlr4-runtime-4.13.1.jar:4.13.1]
at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.booleanExpression(FiltersParser.java:556) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.where(FiltersParser.java:199) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at org.springframework.ai.vectorstore.filter.FilterExpressionTextParser.parse(FilterExpressionTextParser.java:147) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
... 46 common frames omitted
Underscore seems to be the issue. Suggest we change to use camel case for document readers that add metadata fields.
Comment From: iAMSagar44
Is this specific to PGVectorStore only or others too? It works fine for me when I use the Filter.Expression record. For e.g., the below code works (I am using Spring AI version 1.0.0-M1) -
final var expression = new Filter.Expression(Filter.ExpressionType.EQ, new Filter.Key("file_name"),
new Filter.Value("sample_file.pdf"));
List<Document> similarDocuments = vectorStore
.similaritySearch(SearchRequest.query(message)
.withFilterExpression(expression));
Here are some more observations on Metadata filtering -
- When I use the PgVectorFilterExpressionConverter, the above code does not work. For e.g., the below code does not work -
FilterExpressionConverter converter = new PgVectorFilterExpressionConverter();
final var expression = new Filter.Expression(Filter.ExpressionType.EQ, new Filter.Key("file_name"),
new Filter.Value("sample_file.pdf"));
final var convertExpression = converter.convertExpression(expression);
List<Document> similarDocuments = vectorStore
.similaritySearch(SearchRequest.query(message)
.withFilterExpression(convertExpression));
I see the following error in the logs -
Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: org.springframework.ai.vectorstore.filter.FilterExpressionTextParser$FilterExpressionParseException: Source: <unknown>, Line: 1:7, Error: no viable alternative at input '.'] with root cause
org.antlr.v4.runtime.NoViableAltException: null
at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2014) ~[antlr4-runtime-4.13.1.jar:4.13.1]
- The 'IN' filter type does not seem to work at all. For e.g. the following code does not work -
final var expression = new Filter.Expression(Filter.ExpressionType.IN, new Filter.Key("file_name"),
new Filter.Value(List.of("file1.pdf", "file2.pdf", "file3.pdf")));
List<Document> similarDocuments = vectorStore
.similaritySearch(SearchRequest.query(message)
.withFilterExpression(expression));
I get the following error -
Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding <=> ? AS distance FROM vector_store WHERE embedding <=> ? < ? AND metadata::jsonb @@ '$.file_name in ["file1.pdf","file2.pdf","file3.pdf"]'::jsonpath ORDER BY distance LIMIT ? ]] with root cause
org.postgresql.util.PSQLException: ERROR: syntax error at or near " " of jsonpath input
Position: 110
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2725) ~[postgresql-42.7.3.jar:42.7.3]
I tried with a metadata without an underscore and still face the same error.
Comment From: markpollack
Thanks for the investigation, there is a deeper bug lurking here.
Comment From: dafriz
I have raised https://github.com/spring-projects/spring-ai/pull/1483 with a fix to the original issue - that unescaped underscores were not accepted in filter expression keys.
For the 2nd issue mentioned above - "When I use the PgVectorFilterExpressionConverter, the above code does not work" I would say that it is valid to throw an error in that case - where the results of PgVectorFilterExpressionConverter. convertExpression
are passed to SearchRequest. withFilterExpression
as the portable SearchRequest is not designed to be aware of PgVector style expressions.