The field name file_name is not compatible with the filter expression parsing.

        SearchRequest searchRequest = SearchRequest.defaults()
                .withTopK(4)
                .withFilterExpression(PagePdfDocumentReader.METADATA_FILE_NAME + " == 'medicaid-wa-faqs.pdf'");

where `public static final String METADATA_FILE_NAME = "file_name"

throws the exception

Caused by: org.antlr.v4.runtime.NoViableAltException: null
    at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2014) ~[antlr4-runtime-4.13.1.jar:4.13.1]
    at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:445) ~[antlr4-runtime-4.13.1.jar:4.13.1]
    at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:371) ~[antlr4-runtime-4.13.1.jar:4.13.1]
    at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.booleanExpression(FiltersParser.java:556) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.where(FiltersParser.java:199) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    at org.springframework.ai.vectorstore.filter.FilterExpressionTextParser.parse(FilterExpressionTextParser.java:147) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    ... 46 common frames omitted

Underscore seems to be the issue. Suggest we change to use camel case for document readers that add metadata fields.

Comment From: iAMSagar44

Is this specific to PGVectorStore only or others too? It works fine for me when I use the Filter.Expression record. For e.g., the below code works (I am using Spring AI version 1.0.0-M1) -

final var expression = new Filter.Expression(Filter.ExpressionType.EQ, new Filter.Key("file_name"),
                new Filter.Value("sample_file.pdf"));

        List<Document> similarDocuments = vectorStore
                .similaritySearch(SearchRequest.query(message)
                        .withFilterExpression(expression));

Here are some more observations on Metadata filtering -

  1. When I use the PgVectorFilterExpressionConverter, the above code does not work. For e.g., the below code does not work -
FilterExpressionConverter converter = new PgVectorFilterExpressionConverter();
final var expression = new Filter.Expression(Filter.ExpressionType.EQ, new Filter.Key("file_name"),
                new Filter.Value("sample_file.pdf"));

        final var convertExpression = converter.convertExpression(expression);

        List<Document> similarDocuments = vectorStore
                .similaritySearch(SearchRequest.query(message)
                        .withFilterExpression(convertExpression));

I see the following error in the logs -

Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: org.springframework.ai.vectorstore.filter.FilterExpressionTextParser$FilterExpressionParseException: Source: <unknown>, Line: 1:7, Error: no viable alternative at input '.'] with root cause

org.antlr.v4.runtime.NoViableAltException: null
    at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2014) ~[antlr4-runtime-4.13.1.jar:4.13.1]
  1. The 'IN' filter type does not seem to work at all. For e.g. the following code does not work -
final var expression = new Filter.Expression(Filter.ExpressionType.IN, new Filter.Key("file_name"),
                new Filter.Value(List.of("file1.pdf", "file2.pdf", "file3.pdf")));

        List<Document> similarDocuments = vectorStore
                .similaritySearch(SearchRequest.query(message)
                        .withFilterExpression(expression));

I get the following error -

 Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding <=> ? AS distance FROM vector_store WHERE embedding <=> ? < ?  AND metadata::jsonb @@ '$.file_name in ["file1.pdf","file2.pdf","file3.pdf"]'::jsonpath  ORDER BY distance LIMIT ? ]] with root cause

org.postgresql.util.PSQLException: ERROR: syntax error at or near " " of jsonpath input
  Position: 110
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2725) ~[postgresql-42.7.3.jar:42.7.3]

I tried with a metadata without an underscore and still face the same error.

Comment From: markpollack

Thanks for the investigation, there is a deeper bug lurking here.

Comment From: dafriz

I have raised https://github.com/spring-projects/spring-ai/pull/1483 with a fix to the original issue - that unescaped underscores were not accepted in filter expression keys.

For the 2nd issue mentioned above - "When I use the PgVectorFilterExpressionConverter, the above code does not work" I would say that it is valid to throw an error in that case - where the results of PgVectorFilterExpressionConverter. convertExpression are passed to SearchRequest. withFilterExpression as the portable SearchRequest is not designed to be aware of PgVector style expressions.