Pandas read_fwf - parsers.py PythonParser._rows_to_columns line 2814 object of type 'NoneType' has no len()

Line 2814 in parsers.py throws an error if self.delimiter is None:

"object of type 'NoneType' has no len()"

Here is the current line of code where the error happens:

if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
    # see gh-13374
    reason = ('Error could possibly be due to quotes being '
        'ignored when a multi-char delimiter is used.')
    msg += '. ' + reason

I propose the following fix, which I believe should be a safe replacement:

if self.delimiter is not None and len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.3 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.5.0.post20170921 Cython: 0.26.1 numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 6.1.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.8 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 0.9.8 lxml: 3.8.0 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: TomAugspurger

@C5G6M7 can you write a small, reproducible example that currently fails, and submit a fix with it as a test case?

Comment From: C5G6M7

@TomAugspurger Yes, I can work on this tonight. I encountered it on quite a big dataset running inside of a another application that uses pandas, so I'm going to have to do a bit of debugging to see how to reproduce this with a simpler input.

In general though, in the above code if self.delimiter is ever None during the execution of this line it will cause an error. I did make the quick patch proposed above to my pandas installation and the problem went away. I believe it is safe to make given that the following code executed in the conditional is just an error message related to a multi-char delimiter which wouldn't be applicable anyway if the delimiter was none.

However there could be another issue earlier in the code if it is always expected that either the delimiter should have a default string value assignment such as a comma so that it has always len() method or self.quoting == csv.QUOTE_NONE whenever the delimiter does not have a value with a len() method.

I'm not 100% sure but it also might fix the issue by just rearranging the order of the conditionals so that "self.quoting != csv.QUOTE_NONE" is executed first so that if this evaluates to false it never checks "len(self.delimiter)"

Comment From: C5G6M7

@TomAugspurger still working on reproducing this. I removed edits I made initially to handle this and haven't encountered the error again yet, but also it can only occur with files that have bad lines, which means the column names must be explicitly passed so that it does not automatically create the extra columns.

Unfortunately I can't remember which file it was that caused this. I'm going to continue running this and as soon as I encounter a file that produces the issue again I will update this.

Comment From: markjszy

@TomAugspurger

It looks like it was already fixed in development a few months back, with the same sort of solution that @C5G6M7 proposed:

Commit: https://github.com/pandas-dev/pandas/commit/23050dca1b404d23527132c0277f3d40dc41cab8

This looks isolated enough to be backported, or else just closed since it has been fixed in future release.

Comment From: TomAugspurger

Indeed, dupe of https://github.com/pandas-dev/pandas/issues/13374

Pandas read_fwf - parsers.py PythonParser._rows_to_columns line 2814 object of type 'NoneType' has no len()

Output of pd.show_versions()

Output of `pd.show_versions()`