Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
df = pd.DataFrame(np.ones([2, 8]))
df.iloc[0, 0] = np.nan
df.columns = pd.MultiIndex.from_product([list('ab'), list('cd'), list('ef')])
df.sum(axis=1, level=[0, 1], skipna=False)
Problem description
In Python 3.6 a ValueError is raised: ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>
. In Python 2.7 the above works as expected.
Expected Output
The same as in Python 2.7, namely the dataframe:
a b
c d c d
0 NaN 2.0 2.0 2.0
1 2.0 2.0 2.0 2.0
Output of pd.show_versions()
Comment From: shangyian
I was able to reproduce this discrepancy between Python 2.7 and 3.6 as well.
Tried to dig into the code, and the main differences I noticed between the two were that in _python_agg_general
, it calls _iterate_slices
, which only has a single slice in Python 3.6, but has 2 in 2.7: https://github.com/pandas-dev/pandas/blob/15bbb28865fa1a259e4d61b85604dea1c5f2f249/pandas/core/groupby.py#L1043-L1049 Somehow self._selection
isn't being populated correctly.
Comment From: TomAugspurger
Thanks for looking @shangyian. Are you interested in digging further? Do you need any pointers on where to look next?
Comment From: shangyian
Sure, I can try to dig further.
My main issue right now is that I don't quite understand where _selection
in _iterate_slices()
is being populated. Any help with pinning that down would be much appreciated.
Comment From: TomAugspurger
I'm not sure if this is the root cause, but in https://github.com/pandas-dev/pandas/blob/6ef4be3f8f269f147b5abedecf7da6f19af305d3/pandas/core/groupby.py#L1051
For Py2 that throws a TypeError, which is caught
(Pdb) self.grouper.agg_series(obj, f)
*** TypeError: unbound method sum() must be called with DataFrame instance as first argument (got Series instance instead)
For Py3, that throws a ValueError, which is not
ipdb> self.grouper.agg_series(obj, f)
*** ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>
I'm not sure what the correct behavior is here, but maybe that helps :)
Comment From: shangyian
Ah, there it is - thanks! 👍
If we just have the resulting function catch both TypeError
and ValueError
, it'll produce the same result as Py2. It seems that the crux of the issue is that it tries to do series-level aggregation, which doesn't work, and it's supposed to move onto the _python_apply_general
method after catching the errors.
I'll add a PR for this.
Comment From: jbrockmendel
The level keyword is no longer present in Series/DataFrame reductions. Closing.