line 2549 of parsers.py should be
self.delimiter = b'\r\n' + bytes(delimiter, 'utf8') if delimiter else b'\n\r\t '
instead of
self.delimiter = '\r\n' + delimiter if delimiter else '\n\r\t '
otherwise an exception is going to be raised on line 2605 of the same module, as it expects delimiter
to be bytes
not str
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.1.final.0 python-bits: 64 OS: Linux OS-release: 4.2.0-12-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: pt_BR.UTF-8
pandas: 0.18.1 nose: 1.2.1 pip: 8.1.2 setuptools: 22.0.5 Cython: 0.24 numpy: 1.11.0 scipy: 0.17.1 statsmodels: 0.6.1 xarray: 0.7.2 IPython: 4.2.0 sphinx: 1.3.5 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.1 openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: 0.7.3 lxml: None bs4: 4.4.1 html5lib: 0.999 httplib2: 0.9.1 apiclient: None sqlalchemy: 1.0.13 pymysql: 0.7.2.None psycopg2: 2.6.1 (dt dec pq3 ext lo64) jinja2: 2.8 boto: 2.40.0 pandas_datareader: None
Comment From: jreback
pls show a reproducible example
Comment From: fccoelho
I'd have to upload a FWF file to give an example.
But basically, if you have a file like this:
123 1231 2312 1231
213 3455 3534 5345
if you try to load it with pd.read_fwf(fobj, colspecs=[(0,3),(4,8),(9,13),(14,18)])
,
you will see the problem. If you use colspecs='infer', the bug does not
show up.
On Fri, Jun 10, 2016 at 5:20 PM, Jeff Reback notifications@github.com wrote:
pls show a reproducible example
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/pandas/issues/13424#issuecomment-225285491, or mute the thread https://github.com/notifications/unsubscribe/AAIjWwKHrVEaB1JEcBP8v8JdRMtWDiKxks5qKccWgaJpZM4IzSmG .
Flávio Codeço Coelho
+55(21) 3799-5551 Professor Escola de Matemática Aplicada Fundação Getulio Vargas Praia de Botafogo, 190 sala 312 Rio de Janeiro - RJ 22250-900 Brasil
Comment From: TomAugspurger
Looks correct here
from io import StringIO
f = StringIO('''123 1231 2312 1231
213 3455 3534 5345
''')
df = pd.read_fwf(f, colspecs=[(0,3),(4,8),(9,13),(14,18)], header=None)
df
## -- End pasted text --
Out[14]:
0 1 2 3
0 123 1231 2312 1231
1 213 3455 3534 5345
Comment From: jreback
I agree, this does seem to work, so you may have a LOCALE issue, just a guess. Pls provide an exact reproduction.
Comment From: fccoelho
Here is an example which raises this TypeError:
Traceback (most recent call last):
File "test_fwf.py", line 40, in <module>
df = pd.read_fwf('fwf.csv', colspecs=col_specs)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 593, in read_fwf
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 805, in _make_engine
self._engine = klass(self.f, **self.options)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2619, in __init__
PythonParser.__init__(self, f, **kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1608, in __init__
self.columns, self.num_original_columns = self._infer_columns()
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1823, in _infer_columns
line = self._buffered_line()
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1975, in _buffered_line
return self._next_line()
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2006, in _next_line
orig_line = next(self.data)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2606, in __next__
for (fromm, to) in self.colspecs]
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 2606, in <listcomp>
for (fromm, to) in self.colspecs]
TypeError: strip arg must be None or str
Comment From: gfyoung
@fccoelho : ran your scripts exactly as they are from the gist using the same Python version on Ubuntu 14.04 and cannot reproduce the error that you are getting.
Comment From: jreback
@gfyoung did you account for the LOCALE? I suspect that's the issue here.
Comment From: gfyoung
Oh no, I didn't account for it - I was just saying that his example couldn't be reproduced! :smile: Yes, I do suspect that LOCALE could be the problem here. @fccoelho , when you apply the patch you described above, do the errors go away?
Comment From: gfyoung
@jreback : Seems like this issue has gone stale, and since we can't seem to reproduce, I move that we close this one for now.
Comment From: jreback
@gfyoung does the example above repro the problem?
Comment From: gfyoung
@jreback : Not on my machine.
Comment From: jreback
ok, closing,
@fccoelho if you are still experiencing pls comment. but will need an example that people can repro with.