This is my code
List<Document> documents = new TikaDocumentReader(resource).read();
return new TokenTextSplitter(knowledgeBaseFileSlice.getDefaultChunkSize(), knowledgeBaseFileSlice.getMinChunkSizeChars(),
knowledgeBaseFileSlice.getMinChunkLengthToEmbed(), knowledgeBaseFileSlice.getMaxNumChunks(),
knowledgeBaseFileSlice.isKeepSeparator()).apply(documents);
When I traverse the documents, getText() output text and metadata, I don't know about this
Comment From: MusicBoooox
output like this:
docProps/app.xml Normal.dotm 1 0 0 0 0 0 false false 0 WPS Office_6.11.0.8885_F1E327BC-269C-435d-A152-05C5408002CA 0
docProps/core.xml 2024-12-20T15:26:00Z HBN HBN 2024-12-20T15:27:20Z 1
docProps/custom.xml 2052-6.11.0.8885 1D2337B3384BA7C4251C65677B81A9CB_41
word/styles.xml
word/settings.xml
word/theme/theme1.xml
word/document.xml RAG背景知识文档 RAG(Retrieval-Augmented Generation)是一种结合信息检索和生成模型的混合方法,旨在提高文本生成任务的质量和准确性。