This is my code:

   Resource resource = new FileSystemResource(filePath);
    List<Document> documents = new TikaDocumentReader(resource).read();
    return new TokenTextSplitter(knowledgeBaseFileSlice.getDefaultChunkSize(), 
            knowledgeBaseFileSlice.getMinChunkSizeChars(),
            knowledgeBaseFileSlice.getMinChunkLengthToEmbed(), knowledgeBaseFileSlice.getMaxNumChunks(),
            knowledgeBaseFileSlice.isKeepSeparator()).apply(documents);

After TikaDocumentReader reads a Word document, the content read not only includes the text of the document, but also the XML information of the file, if I use getText(), the output will include the following content, like this: docProps/app.xml Normal.dotm 1 0 0 0 0 0 false false 0 WPS Office_10.1.0.7698_F1E327BC-269C-435d-A152-05C5408002CA 0

docProps/core.xml 2023-08-26T18:18:00Z admin admin 2023-08-26T18:18:41Z 1

docProps/custom.xml 2052-10.1.0.7698

word/styles.xml

word/settings.xml

word/theme/theme1.xml

word/document.xml

What went wrong???