After running this sample application (ai-openai-rag) I noticed that there are 12 entries in the vector_store table, but the page_number field (in the metadata column) for all the entries is shown as 10.

`vector_store=# select metadata from vector_store; metadata

{"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} {"file_name": "/target/classes/data/medicaid-wa-faqs.pdf", "page_number": 10} (12 rows)`

Looks like the doSplitDocuments method in the TextSplitter class is generating this incorrect page_number for all the documents.

Comment From: yucaowang

+1

Comment From: injae-kim

Seems fixed by https://github.com/spring-projects/spring-ai/commit/6753e242e8f9fa5dfc8ee239ba2add6c1c2b6855 so can we close this issue?

https://github.com/spring-projects/spring-ai/pull/934 -> Enhance test about incorrect page number case

Comment From: markpollack

934 is adding a test, and not changing any code. The test is creating the metadata with the page_number explicitly and validating that it worked as expected. What is reported in this issue looks to be a bug in PagePdfDocumentReader, so this issue should remain open. Also, that sample app needs to be updated, I'll fix that.