As I see pandas.read_csv()
and pandas.read_excel()
handle differently the skiprows
argument. I have the same data in a CSV file and in an Excel file:
abc def
1 10
2 11
3 12
4 13
5 14
I want to use different column names when I read the data, so I specify the desired column names in names
argument and I skip the first (header) row of my CSV & Excel files:
pd.read_excel('test.xlsx', skiprows=0, names=['foo', 'bar'])
returns with my expected result:
foo bar
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
I get the same expected result with pd.read_csv('test.csv', skiprows=1, names=['foo', 'bar'])
. But pd.read_csv('test.csv', skiprows=0, names=['foo', 'bar'])
keeps the first (header) row of the input file:
foo bar
0 abc def
1 1 10
2 2 11
3 3 12
4 4 13
5 5 14
Is this the expected behavior of skiprows
or something is wrong at pandas.read_csv()
?
Comment From: TomAugspurger
I think I would do this using the header
argument.
In [17]: pd.read_excel("foo.xlsx", names=['foo', 'bar'], header=0)
Out[17]:
foo bar
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
In [18]: pd.read_csv("foo.csv", names=['foo', 'bar'], header=0)
Out[18]:
foo bar
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
This is a symptom of https://github.com/pandas-dev/pandas/issues/11889
Comment From: TomAugspurger
Since you actually hit that API inconsistency, your input would be appreciated in #11889