- [ ]
pd.Summary
- [x] dict-of-dict #9052
- [ ] named functions #10100
From SO
In [100]: df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
'B': [10, 12, 10, 25, 10, 12],
'C': [100, 102, 100, 250, 100, 102]})
In [101]: df
Out[101]:
A B C
0 group1 10 100
1 group1 12 102
2 group2 10 100
3 group2 25 250
4 group3 10 100
5 group3 12 102
In [102]: df.groupby('A').agg(['mean',lambda x: x.iloc[1]])
Out[102]:
B C
mean <lambda> mean <lambda>
A
group1 11.0 12 101 102
group2 17.5 25 175 250
group3 11.0 12 101 102
Would be nice to be able to use lambda x: x.nth(1)
or maybe 'nth(1)'
In [103]: df.groupby('A').agg(['mean','nth(1')])
Comment From: jreback
@shoyer heres my use-case. This is a specification engine (similar in concept to pd.Grouper
), to enable one to easily specify ordering of output summary values (e.g. no need for dict-like), and to bind meta-data to the function, (mostly name), w/o too much boilerplate). Should be fully back-compat.
S = pd.Summary
# 8593
df.groupby('A').agg([S('mean',name='nth(1)']))
# 10100
dfAggregated = grouped.agg([np.mean,np.std,S(lambda v: v.mean()/v.max(),name='normalized mean'])
# 9052
df.groupby('B').agg({ 'A': [S('mean',name='mean1'),S('median',name='med1')],
'C': [S('mean',name='mean2'),S('median',name='med2')],
})
# maybe could also do
df.groupby('B').agg([ S('mean',column='A',name='mean1'),
S('median',column='A',name='med1'),
S('mean',columns='C',name='mean2'),
S('median',columns='C',name='med2')
])
Comment From: RylanSchaeffer
I know this isn't the place, but how does one use nth()
with a groupby's .agg()
method?
Comment From: jbrockmendel
Dicussed on today's dev call, main comments were 1) does NamedAgg cover some of this? 2) "No" to doing eval on "nth(1)", 3) pd.Grouper is one of the harder-to-reason-about parts of the codebase and we don't want more things like it. Closing as no action.