This works:
df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3], 'B': [1, 1, 1, 2, 2, 2, 2]})
df.groupby('B').agg(pd.Series.mode)
but this doesn't:
df.groupby('B').agg('mode')
...
AttributeError: Cannot access callable attribute 'mode' of 'DataFrameGroupBy' objects, try using the 'apply' method
I thought all the series aggregate methods propagated automatically to groupby, but I've probably misunderstood?
Comment From: patricksurry
Hmm, I guess this might be because pd.Series.mode() returns a series, not a scalar. So maybe I need my own mode that decides how to handle the multi-modal case, e.g. pd.Series.mode().mean() or whatever?
Comment From: TomAugspurger
might be because pd.Series.mode() returns a series, not a scalar
Correct. IIRC there's an older issue about this, where we decided to keep our behavior of always returning a series, and not adding a flag to reduce if possible. I could be misremembering though.
In these cases I'll usually just use scipy's
df.groupby('B').agg(lambda x: scipy.stats.mode(x)[0])
scipy.stats.mode
returns a tuple of (mode, count)
and we just want the mode.
Comment From: jreback
Here's a mini-example; could be like .value_counts()
In [6]: df = DataFrame({'A' : [1,2,1,2], 'B' : [1,1,1,1]})
In [7]: df
Out[7]:
A B
0 1 1
1 2 1
2 1 1
3 2 1
In [8]: df.groupby('A').B.value_counts()
Out[8]:
A B
1 1 2
2 1 2
Name: B, dtype: int64
In [9]: df.groupby('A').B.apply(lambda x: x.mode())
Out[9]:
A
1 0 1
2 0 1
Name: B, dtype: int64
Comment From: kernc
What about when grouping Series?
I have no issue with .agg('mode')
returning the first mode, if any, while issuing a warning if the modes were multuple.
Comment From: gosuto-inzasheru
I encountered this problem and ended up settling for this:
.agg({'column': lambda x: pd.Series.mode(x)[0][0]})
But yes, I agree with @kernc, I would not mind .agg('mode')
returning the first mode if multiple modes are returned.
Comment From: mroeschke
xref https://github.com/pandas-dev/pandas/issues/19254
Comment From: rhshadrach
Closing as a duplicate of #19254