Pandas read_csv skiplines breaks in 0.15.2

I have been successfully parsing a csv file in 0.14.1, just upgraded yesterday and now the following code breaks:

df = pd.io.parsers.read_csv(fname,
                             skiprows=range(1, 9))

here is the file:

https://www.dropbox.com/s/grhi6e9vihjf92t/testtown2.csv?dl=0

Versions:


INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.14-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: None
Cython: 0.20.1
numpy: 1.9.2
scipy: None
statsmodels: None
IPython: 2.1.0
sphinx: None
patsy: None
dateutil: 2.4.1
pytz: 2014.10
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: None
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.6.8
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: 0.9
apiclient: None
rpy2: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: 2.6 (dt dec pq3 ext lo64)

Comment From: jreback

straight out of the docs: http://pandas.pydata.org/pandas-docs/stable/io.html#ignoring-line-comments-and-empty-lines

there was an API change in 0.15.0. you may have to adjust the skiprows and/or set skip_empty_lines=False to resolve the ambiguity.

Comment From: bastianlb

The problem is I don't have control over the format of the lines I want to skip, so "commenting" them out is not an option. skip_blank_lines=False does not seem to work.. (the lines aren't blank?). I know which lines I want to ignore in advance, I just can't predict their format.

Thanks for the help.

Comment From: bastianlb

I suppose I could use regex to indicate which rows are to be omitted, but it seems like only 1-length comment tags are currently supported.

Comment From: jreback

can you post lines 1-12 or so

Comment From: bastianlb

NAME,testtown sewer flow,testtown sewer depth,testtown sewer velocity
CONTEXT,sewer,,
TYPE,flow,depth,velocity
START, 1/1/11 0:00,,
END, 12/30/13 23:00,,
TZ, US/Eastern,,
INC,1,,
UNITS,cfs,ft,ft/s
DATA,,,
1,1.451139854,0.470901008,1.720064044
2,1.20869145,0.514026212,1.629014523
3,12,0.419756899,1.517084435
4,12,0.418530849,1.40467042
5,12,0.317392778,1.358932374
6,12,0.26695894,1.32262311
7,12,0.428965151,1.381718785
8,12,0.289426375,1.449016799
9,1.173475721,0.429696648,1.614747635

also file is for download in the original posting if that helps.

Comment From: jreback

works for me in 0.15.2

In [2]: pd.read_csv(StringIO(data),skiprows=range(1,9))
Out[2]: 
   NAME  testtown sewer flow  testtown sewer depth  testtown sewer velocity
0     1             1.451140              0.470901                 1.720064
1     2             1.208691              0.514026                 1.629015
2     3            12.000000              0.419757                 1.517084
3     4            12.000000              0.418531                 1.404670
4     5            12.000000              0.317393                 1.358932
5     6            12.000000              0.266959                 1.322623
6     7            12.000000              0.428965                 1.381719
7     8            12.000000              0.289426                 1.449017
8     9             1.173476              0.429697                 1.614748

Comment From: bastianlb

Ok, I realized the source of the problem is that this file came from a windows machine. I can read it as follows:

df = pd.io.parsers.read_csv(fname, skiprows=range(1, 9), lineterminator="\r")

however, now, this code now breaks on files of the same format with "\n" terminators. Could it be that this flexibility was lost in 0.15?

Comment From: bastianlb

Interestingly enough, read_csv is able to parse both kinds of line terminators as long as I don't specify a skiprows argument, and in 0.14.1 this worked with skiprows.

Comment From: gfyoung

Not really seeing much of an issue anymore?

>>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>>
>>> data = '1\r\a\r2'
>>> read_csv(StringIO(data), skiprows=1, engine='c')
   a
0  2
>>> read_csv(StringIO(data), skiprows=1, engine='python')
...
_csv.Error: new-line character seen in unquoted field - 
do you need to open the file in universal-newline mode?

The Python engine breakage is expected because custom line-terminators are not accepted.

Comment From: gfyoung

@jreback : I think this issue can be closed based on what I said above.

Comment From: jorisvandenbossche

@bastianl If you still have problems with this, fee free to reopen.