Line 2814 in parsers.py throws an error if self.delimiter is None:
"object of type 'NoneType' has no len()"
Here is the current line of code where the error happens:
if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
# see gh-13374
reason = ('Error could possibly be due to quotes being '
'ignored when a multi-char delimiter is used.')
msg += '. ' + reason
I propose the following fix, which I believe should be a safe replacement:
if self.delimiter is not None and len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
Output of pd.show_versions()
Comment From: TomAugspurger
@C5G6M7 can you write a small, reproducible example that currently fails, and submit a fix with it as a test case?
Comment From: C5G6M7
@TomAugspurger Yes, I can work on this tonight. I encountered it on quite a big dataset running inside of a another application that uses pandas, so I'm going to have to do a bit of debugging to see how to reproduce this with a simpler input.
In general though, in the above code if self.delimiter is ever None during the execution of this line it will cause an error. I did make the quick patch proposed above to my pandas installation and the problem went away. I believe it is safe to make given that the following code executed in the conditional is just an error message related to a multi-char delimiter which wouldn't be applicable anyway if the delimiter was none.
However there could be another issue earlier in the code if it is always expected that either the delimiter should have a default string value assignment such as a comma so that it has always len() method or self.quoting == csv.QUOTE_NONE whenever the delimiter does not have a value with a len() method.
I'm not 100% sure but it also might fix the issue by just rearranging the order of the conditionals so that "self.quoting != csv.QUOTE_NONE" is executed first so that if this evaluates to false it never checks "len(self.delimiter)"
Comment From: C5G6M7
@TomAugspurger still working on reproducing this. I removed edits I made initially to handle this and haven't encountered the error again yet, but also it can only occur with files that have bad lines, which means the column names must be explicitly passed so that it does not automatically create the extra columns.
Unfortunately I can't remember which file it was that caused this. I'm going to continue running this and as soon as I encounter a file that produces the issue again I will update this.
Comment From: markjszy
@TomAugspurger
It looks like it was already fixed in development a few months back, with the same sort of solution that @C5G6M7 proposed:
Commit: https://github.com/pandas-dev/pandas/commit/23050dca1b404d23527132c0277f3d40dc41cab8
This looks isolated enough to be backported, or else just closed since it has been fixed in future release.
Comment From: TomAugspurger
Indeed, dupe of https://github.com/pandas-dev/pandas/issues/13374