line 2549 of parsers.py should be

self.delimiter = b'\r\n' + bytes(delimiter, 'utf8') if delimiter else b'\n\r\t '

instead of

self.delimiter = '\r\n' + delimiter if delimiter else '\n\r\t '

otherwise an exception is going to be raised on line 2605 of the same module, as it expects delimiter to be bytes not str

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.5.1.final.0 python-bits: 64 OS: Linux OS-release: 4.2.0-12-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: pt_BR.UTF-8

pandas: 0.18.1 nose: 1.2.1 pip: 8.1.2 setuptools: 22.0.5 Cython: 0.24 numpy: 1.11.0 scipy: 0.17.1 statsmodels: 0.6.1 xarray: 0.7.2 IPython: 4.2.0 sphinx: 1.3.5 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.1 openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: 0.7.3 lxml: None bs4: 4.4.1 html5lib: 0.999 httplib2: 0.9.1 apiclient: None sqlalchemy: 1.0.13 pymysql: 0.7.2.None psycopg2: 2.6.1 (dt dec pq3 ext lo64) jinja2: 2.8 boto: 2.40.0 pandas_datareader: None

Comment From: jreback

pls show a reproducible example

Comment From: fccoelho

I'd have to upload a FWF file to give an example.

But basically, if you have a file like this:

123 1231 2312 1231
213 3455 3534 5345

if you try to load it with pd.read_fwf(fobj, colspecs=[(0,3),(4,8),(9,13),(14,18)]), you will see the problem. If you use colspecs='infer', the bug does not show up.

On Fri, Jun 10, 2016 at 5:20 PM, Jeff Reback notifications@github.com wrote:

pls show a reproducible example

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/pandas/issues/13424#issuecomment-225285491, or mute the thread https://github.com/notifications/unsubscribe/AAIjWwKHrVEaB1JEcBP8v8JdRMtWDiKxks5qKccWgaJpZM4IzSmG .

Flávio Codeço Coelho

+55(21) 3799-5551 Professor Escola de Matemática Aplicada Fundação Getulio Vargas Praia de Botafogo, 190 sala 312 Rio de Janeiro - RJ 22250-900 Brasil

Comment From: TomAugspurger

Looks correct here

from io import StringIO

f = StringIO('''123 1231 2312 1231
213 3455 3534 5345
''')

df = pd.read_fwf(f, colspecs=[(0,3),(4,8),(9,13),(14,18)], header=None)
df

## -- End pasted text --
Out[14]:
     0     1     2     3
0  123  1231  2312  1231
1  213  3455  3534  5345

Comment From: jreback

I agree, this does seem to work, so you may have a LOCALE issue, just a guess. Pls provide an exact reproduction.

Comment From: fccoelho

Here is an example which raises this TypeError:

Traceback (most recent call last):
  File "test_fwf.py", line 40, in <module>
    df  = pd.read_fwf('fwf.csv', colspecs=col_specs)
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 593, in read_fwf
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 315, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 645, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 805, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2619, in __init__
    PythonParser.__init__(self, f, **kwds)
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1608, in __init__
    self.columns, self.num_original_columns = self._infer_columns()
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1823, in _infer_columns
    line = self._buffered_line()
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1975, in _buffered_line
    return self._next_line()
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2006, in _next_line
    orig_line = next(self.data)
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2606, in __next__
    for (fromm, to) in self.colspecs]
  File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2606, in <listcomp>
    for (fromm, to) in self.colspecs]
TypeError: strip arg must be None or str

Comment From: gfyoung

@fccoelho : ran your scripts exactly as they are from the gist using the same Python version on Ubuntu 14.04 and cannot reproduce the error that you are getting.

Comment From: jreback

@gfyoung did you account for the LOCALE? I suspect that's the issue here.

Comment From: gfyoung

Oh no, I didn't account for it - I was just saying that his example couldn't be reproduced! :smile: Yes, I do suspect that LOCALE could be the problem here. @fccoelho , when you apply the patch you described above, do the errors go away?

Comment From: gfyoung

@jreback : Seems like this issue has gone stale, and since we can't seem to reproduce, I move that we close this one for now.

Comment From: jreback

@gfyoung does the example above repro the problem?

Comment From: gfyoung

@jreback : Not on my machine.

Comment From: jreback

ok, closing,

@fccoelho if you are still experiencing pls comment. but will need an example that people can repro with.

Pandas bug in pandas.read_fwf

output of pd.show_versions()

INSTALLED VERSIONS

Flávio Codeço Coelho

output of `pd.show_versions()`