When truncating the token, we need to find the last punctuation mark. But in our current code, we're looping over each punctuation mark. If it can't find that punctuation mark, it will loop the length of the chunkText. I added something to define the punctuation as an enum and terminate the loop as soon as that punctuation is found.
Thanks.
Comment From: tzolov
Hi @hgs-study , thanks for your contribution.
Could you please rebase your PR on top of the main branch and also check the failing TextReaderTests
.
I had the impression that this punctuation change streamlines the code but doesn't alter the split semantics itself.
The failing TextReaderTests seems to suggests otherwise though?
Comment From: markpollack
Closing as the current functionality has passing tests and no feedback on questions. Please resubmit if you want and thanks in any case.