- Provides a rudimentary text extractions for multitude of document formats, including PDF, Word Doc/Docx PowerPoint ppt/pptx and many more.
- Generates a single Document for the extracted text.
- No pre or post processing and cleansing for the text.
Resolves #12
Comment From: markpollack
added javadocs and clean up a little bit of the code. Merged as f9ca032cb39dc762cc57ac12c05ea35088b3d3cd