• Currently the Document mixes multiple concerns that should be isolated in dedicated classes.
  • Encapsulate the content formatting into a separate ContentFormatter abstraction.
  • Refactoring the document loading and indexing strategy.
  • Moves toward ETL processing model.
  • Add metadata enriching transformers as DocumentTranformer.