After running this sample application (ai-openai-rag) I noticed that there are 12 entries in the vector_store table, but the page_number field (in the metadata column) for all the entries is shown as 10.
`vector_store=# select metadata from vector_store; metadata
{"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} (12 rows)`
Looks like the doSplitDocuments method in the TextSplitter class is generating this incorrect page_number for all the documents.
Comment From: yucaowang
+1
Comment From: injae-kim
Seems fixed by https://github.com/spring-projects/spring-ai/commit/6753e242e8f9fa5dfc8ee239ba2add6c1c2b6855 so can we close this issue?
https://github.com/spring-projects/spring-ai/pull/934 -> Enhance test about incorrect page number case
Comment From: markpollack