This is my code

    List<Document> documents = new TikaDocumentReader(resource).read();

    return new TokenTextSplitter(knowledgeBaseFileSlice.getDefaultChunkSize(), knowledgeBaseFileSlice.getMinChunkSizeChars(),
            knowledgeBaseFileSlice.getMinChunkLengthToEmbed(), knowledgeBaseFileSlice.getMaxNumChunks(),
            knowledgeBaseFileSlice.isKeepSeparator()).apply(documents);

When I traverse the documents, getText() output text and metadata, I don't know about this

Comment From: MusicBoooox

output like this:

docProps/app.xml Normal.dotm 1 0 0 0 0 0 false false 0 WPS Office_6.11.0.8885_F1E327BC-269C-435d-A152-05C5408002CA 0

docProps/core.xml 2024-12-20T15:26:00Z HBN HBN 2024-12-20T15:27:20Z 1

docProps/custom.xml 2052-6.11.0.8885 1D2337B3384BA7C4251C65677B81A9CB_41

word/styles.xml

word/settings.xml

word/theme/theme1.xml

word/document.xml RAG背景知识文档 RAG(Retrieval-Augmented Generation)是一种结合信息检索和生成模型的混合方法,旨在提高文本生成任务的质量和准确性。