Code Sample, a copy-pastable example if possible
import pandas as pd
mi = pd.MultiIndex.from_tuples([['b','d'],['b','c'],['b','c'],['a','c']], names=list('AB'))
mi = mi.set_levels(pd.CategoricalIndex(mi.levels[0], categories=['b','a']), level='A')
df = pd.DataFrame([[1,1,1,1],[1,1,1,1]], columns=mi)
df.sum(axis=1, level=[0,1])
returns:
A b a
B c d c d
0 2 1 1 NaN
1 2 1 1 NaN
Problem description
A spurious column (a,d) containing all NaNs is included in the result. Also, the ordering of the second column level is no longer correct.
Expected Output
The same result as when level 'A' would not have been a CategoricalIndex:
import pandas as pd
mi = pd.MultiIndex.from_tuples([['b','d'],['b','c'],['b','c'],['a','c']], names=list('AB'))
df = pd.DataFrame([[1,1,1,1],[1,1,1,1]], columns=mi)
df.sum(axis=1, level=[0,1])
````
which correctly returns:
A b a B d c c 0 1 2 1 1 1 2 1 ```
Output of pd.show_versions()
Comment From: kdebrab
This seems to have been resolved in the meantime. Both in Python 3.6 (pandas version 0.25.1) as in Python 2.7 (pandas version 0.24.2), I now get the expected result when executing the code sample. The spurious column doesn't appear anymore.
Comment From: kdebrab
Maybe not, it rather seems like the example doesn't really work anymore. It seems like mi = mi.set_levels(pd.Categorical(mi.levels[0], categories=['b','a']), level='A')
does no longer have the intended effect...
Comment From: kdebrab
OK, I see: I have to replace pd.Categorical()
by pd.CategoricalIndex()
in order to reproduce the issue. Conclusion: issue has not been solved yet! I updated the code sample accordingly.
Comment From: jbrockmendel
The level keyword is no longer present in DataFrame reductions, and df.groupby(level=[0,1], axis=1).sum()
looks right to me. Closing.