import pandas as pd
df = pd.DataFrame.from_dict([
{'A': 1, 'B': 'cat', 'values': 1},
{'A': 1, 'B': 'cat', 'values': 1},
{'A': 1, 'B': 'dog', 'values': 1},
])
df.set_index(['A', 'B'], drop=True, inplace=True)
# works
g = df.groupby(['A'])
print g.agg(lambda x: {'values': x['values'].mean()})
# doesn't work
g = df.groupby(['A', 'B'])
print g.agg(lambda x: {'values': x['values'].mean()})
Problem description
The current behaviour is KeyError: 'values' in the second example, the first one works as expected.
Expected Output
values
A
1 1.0
values
A B
1 cat 1.0
dog 1.0
Output of pd.show_versions()
Comment From: pzajec
I've tested it with the newest release (v0.21) - error stays the same.
Comment From: jreback
This is quite convoluted with what you are writing. Why are you not simply writing this.
In [10]: df.groupby(['A']).mean()
Out[10]:
values
A
1 1
In [11]: df.groupby(['A', 'B']).mean()
Out[11]:
values
A B
1 cat 1
dog 1
Comment From: pzajec
This is just a simplified example. I need an actual custom aggregation function at the end. My example would be more like this: ```python import pandas as pd import numpy as np
df = pd.DataFrame.from_dict([ {'A': 1, 'B': 'cat', 'values': np.array([1, 2])}, {'A': 1, 'B': 'cat', 'values': np.array([1, 2])}, {'A': 1, 'B': 'dog', 'values': np.array([1, 2])}, ]) df.set_index(['A', 'B'], drop=True, inplace=True)
print df.groupby(['A']).mean() print df.groupby(['A', 'B']).mean()
Which throws pandas.core.base.DataError because of numpy arrays in the dataframe - I actually have a set of columns I would like to aggregate differently so the use of custom aggregation function is inevitable. Designing the data frame in such a way could be a design flaw. But nevertheless I am wondering whether there is something wrong with my approach from the first post or is pandas not working as expected?
**Comment From: jreback**
Well pandas is trying to figure out what you are actually doing. So it tries a a couple of things to see what raises and what doesn't. your function *happens* to work in the first case, but not in the second.
In [17]: def f(x): ...: print(type(x)) ...: return {'values': x['values'].mean()} ...:
In [18]: df.groupby(['A']).agg(f)
1 1.0
In [19]: df.groupby(['A', 'B']).agg(f)
... long traceback
As I said you are fighting pandas here and you have to control what happens in your lambda. If you want this to be robust you need to have a defensive function.