- Currently the Document mixes multiple concerns that should be isolated in dedicated classes.
- Encapsulate the content formatting into a separate ContentFormatter abstraction.
- Refactoring the document loading and indexing strategy.
- Moves toward ETL processing model.
- Add metadata enriching transformers as DocumentTranformer.