• Provides a rudimentary text extractions for multitude of document formats, including PDF, Word Doc/Docx PowerPoint ppt/pptx and many more.
  • Generates a single Document for the extracted text.
  • No pre or post processing and cleansing for the text.

Resolves #12

Comment From: markpollack

added javadocs and clean up a little bit of the code. Merged as f9ca032cb39dc762cc57ac12c05ea35088b3d3cd