This is an (admittedly somewhat naive) implementation of Loader
that loads plain text documents. This enables unstructured documents (I tested with board game rules) to be used in embeddings.
Because it was written quickly and with limited understanding of how best to write such a loader, I'm certain that there's a lot of room for improvement. Nonetheless, I'm submitting it for consideration as a starting point for a a better implementation.
Comment From: habuma
Cleaned up some not-so-neat code in TextLoader
. I've had a chance to try this multiple times with lots of different text files and it works really well. Could it be better? Probably. But even in this form, it's quite good.
Comment From: markpollack
By default we should add to the metadata the filename from where the text came. Also can allow to user the pass in a Map
that contains metadata. We should also allow the encoding to be specified, with the default being UTF-8
Comment From: tzolov
Thanks @habuma , I've polished it a bit, added tests. Squashed and pushed it at: 5c23b9d3d89c806031ee2148f531cfad05c56f9d