JSON Text Markdown PDF Page PDF Paragraph Tika (DOCX, PPTX, HTML…)
Can ETL Pipeline support more ways to read the file content, such as byte [], http url, or the file upload MultipartFile can directly read the file text content?
Comment From: alexcheng1982
You need to provide your own implementations of DocumentReader
.
Comment From: markpollack
TextReader(Resource resource)
takes the spring resource, so that could handle byte[] and URLs. Let me noe @OnceCrazyer if that works for your use case.