• [ ] pd.Summary
  • [x] dict-of-dict #9052
  • [ ] named functions #10100

From SO

In [100]: df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
                   'B': [10, 12, 10, 25, 10, 12],
                   'C': [100, 102, 100, 250, 100, 102]})

In [101]: df
Out[101]: 
        A   B    C
0  group1  10  100
1  group1  12  102
2  group2  10  100
3  group2  25  250
4  group3  10  100
5  group3  12  102

In [102]: df.groupby('A').agg(['mean',lambda x: x.iloc[1]])
Out[102]: 
           B               C          
        mean  <lambda>  mean  <lambda>
A                                     
group1  11.0        12   101       102
group2  17.5        25   175       250
group3  11.0        12   101       102

Would be nice to be able to use lambda x: x.nth(1) or maybe 'nth(1)' In [103]: df.groupby('A').agg(['mean','nth(1')])

Comment From: jreback

@shoyer heres my use-case. This is a specification engine (similar in concept to pd.Grouper), to enable one to easily specify ordering of output summary values (e.g. no need for dict-like), and to bind meta-data to the function, (mostly name), w/o too much boilerplate). Should be fully back-compat.

S = pd.Summary

# 8593
df.groupby('A').agg([S('mean',name='nth(1)']))

# 10100
dfAggregated = grouped.agg([np.mean,np.std,S(lambda v: v.mean()/v.max(),name='normalized mean'])

# 9052
df.groupby('B').agg({ 'A': [S('mean',name='mean1'),S('median',name='med1')],
                      'C': [S('mean',name='mean2'),S('median',name='med2')],
                    })

# maybe could also do
df.groupby('B').agg([ S('mean',column='A',name='mean1'),
                      S('median',column='A',name='med1'),
                      S('mean',columns='C',name='mean2'),
                      S('median',columns='C',name='med2')   
              ])

Comment From: RylanSchaeffer

I know this isn't the place, but how does one use nth() with a groupby's .agg() method?

Comment From: jbrockmendel

Dicussed on today's dev call, main comments were 1) does NamedAgg cover some of this? 2) "No" to doing eval on "nth(1)", 3) pd.Grouper is one of the harder-to-reason-about parts of the codebase and we don't want more things like it. Closing as no action.