Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from io import StringIO
import pandas as pd
data = """a,a,a,b,c,c
q,r,s,t,u,v
1,2,3,4,5,6
7,8,9,10,11,12"""
pd.read_csv(StringIO(data), header=[1, 0])
Issue Description
pandas accepts non-increasing MultiIndex header arguments, but they don't work. For instance, the snippet above produces
NaN
a a.1 a.2 b c c.1
0 a a a b c c
1 q r s t u v
2 1 2 3 4 5 6
3 7 8 9 10 11 12
i.e., a DataFrame whose columns are
MultiIndex([(nan, 'a'),
(nan, 'a.1'),
(nan, 'a.2'),
(nan, 'b'),
(nan, 'c'),
(nan, 'c.1')],
)
Parsing the data with header=[0, 0] is also accepted, and behaves sensibly:
a b c
a a.1 a.2 b c c.1
0 q r s t u v
1 1 2 3 4 5 6
2 7 8 9 10 11 12
but I can't see a reason to support redundant header levels.
Expected Behavior
I propose that non-increasing header arguments raise a ValueError('header elements must be increasing') or something to that effect.
Installed Versions
Comment From: simonjayhawkins
Thanks @ahawryluk for the report.
I propose that non-increasing header arguments raise a ValueError('header elements must be increasing') or something to that effect.
makes sense and keep this issue labelled as a bug for now. The documentation maybe should be updated too.
The alternative could be an enhancement to allow this, but maybe unnecessary to add code complexity when a simple swaplevel would give the expected result in this case.
pd.read_csv(StringIO(data), header=[0, 1]).swaplevel(axis=1)
q r s t u v
a a a b c c
0 1 2 3 4 5 6
1 7 8 9 10 11 12
but I can't see a reason to support redundant header levels.
agreed. This maybe should raise a ValueError
too, although this would be a breaking change if any users currently depend on this behavior.
contributions, PR and further investigation/suggestions welcome.
Comment From: ahawryluk
take