Pandas KeyError using custom aggregation function

import pandas as pd

df = pd.DataFrame.from_dict([
    {'A': 1, 'B': 'cat', 'values': 1},
    {'A': 1, 'B': 'cat', 'values': 1},
    {'A': 1, 'B': 'dog', 'values': 1},
])
df.set_index(['A', 'B'], drop=True, inplace=True)

# works
g = df.groupby(['A'])
print g.agg(lambda x: {'values': x['values'].mean()})

# doesn't work
g = df.groupby(['A', 'B'])
print g.agg(lambda x: {'values': x['values'].mean()})

Problem description

The current behaviour is KeyError: 'values' in the second example, the first one works as expected.

Expected Output

    values
A        
1     1.0
       values
A B          
1 cat       1.0
  dog       1.0

Output of `pd.show_versions()`

- commit: None - python: 2.7.12.final.0 - python-bits: 64 - OS: Linux - OS-release: 4.10.0-37-generic - machine: x86_64 - processor: x86_64 - byteorder: little - LC_ALL: None - LANG: en_US.UTF-8 - LOCALE: None.None - pandas: 0.20.3 - pytest: None - pip: 9.0.1 - setuptools: 36.5.0 - Cython: None - numpy: 1.13.3 - scipy: 1.0.0 - xarray: None - IPython: 5.5.0 - sphinx: None - patsy: None - dateutil: 2.6.1 - pytz: 2017.2 - blosc: None - bottleneck: None - tables: None - numexpr: None - feather: None - matplotlib: 2.0.2 - openpyxl: None - xlrd: None - xlwt: None - xlsxwriter: None - lxml: None - bs4: 4.6.0 - html5lib: 1.0b10 - sqlalchemy: None - pymysql: None - psycopg2: None - jinja2: 2.9.6 - s3fs: None - pandas_gbq: None - pandas_datareader: None

Comment From: pzajec

I've tested it with the newest release (v0.21) - error stays the same.

Comment From: jreback

This is quite convoluted with what you are writing. Why are you not simply writing this.

In [10]: df.groupby(['A']).mean()
Out[10]: 
   values
A        
1       1

In [11]: df.groupby(['A', 'B']).mean()
Out[11]: 
       values
A B          
1 cat       1
  dog       1

Comment From: pzajec

This is just a simplified example. I need an actual custom aggregation function at the end. My example would be more like this: ```python import pandas as pd import numpy as np

df = pd.DataFrame.from_dict([ {'A': 1, 'B': 'cat', 'values': np.array([1, 2])}, {'A': 1, 'B': 'cat', 'values': np.array([1, 2])}, {'A': 1, 'B': 'dog', 'values': np.array([1, 2])}, ]) df.set_index(['A', 'B'], drop=True, inplace=True)

print df.groupby(['A']).mean() print df.groupby(['A', 'B']).mean()

Which throws pandas.core.base.DataError because of numpy arrays in the dataframe - I actually have a set of columns I would like to aggregate differently so the use of custom aggregation function is inevitable. Designing the data frame in such a way could be a design flaw. But nevertheless I am wondering whether there is something wrong with my approach from the first post or is pandas not working as expected? 


**Comment From: jreback**

Well pandas is trying to figure out what you are actually doing. So it tries a a couple of things to see what raises and what doesn't. your function *happens* to work in the first case, but not in the second.

In [17]: def f(x): ...: print(type(x)) ...: return {'values': x['values'].mean()} ...:

In [18]: df.groupby(['A']).agg(f) Out[18]: values A
1 1.0

In [19]: df.groupby(['A', 'B']).agg(f)

... long traceback KeyError: 'values' ```

As I said you are fighting pandas here and you have to control what happens in your lambda. If you want this to be robust you need to have a defensive function.

Pandas KeyError using custom aggregation function

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`