Discussed in https://github.com/spring-projects/spring-ai/discussions/501
Originally posted by **iAMSagar44** March 24, 2024
Currently the DocumentReader interface implementations (e.g.PagePdfDocumentReader) add the page number and file name to the metadata map. This eventually populates the 'metadata' column with the page number and file name.
I want to add more metadata information (for e.g. author) during the ETL process, so that I can then use these fields as metadata fields in the similarity search filters. I could not find any way to do this, apart from creating my own implementation of the DocumentReader interface.
Is there any other way to cater for this use case?
Comment From: markpollack
Hi. You can update the metadata in the document after it is created by the PagePdfDocumentReader. As an example, look at this data loading application, that applies the filename and the version to the document.
That said, the PagePdfDocumentReader should be more subclass friendly. I will make a change for that.
Comment From: markpollack
See https://github.com/spring-projects/spring-ai/pull/1093