Only happens when the user function mutates the input
Originally from: http://stackoverflow.com/questions/20691168/pandas-apply-to-data-frame-groupby/20705226#20705226
Could be auto-fixed or maybe just a better error report to the user (about the dups)
In [40]: df = DataFrame(dict(A = ['foo','foo','bar','bar'], B = np.random.randn(4)),index=[1,1,2,2])
In [41]: df
Out[41]:
A B
1 foo 0.971425
1 foo -1.151693
2 bar 1.265031
2 bar -0.219011
[4 rows x 2 columns]
In [42]: def f(x):
x['std'] = x['B'].std()
return x
....:
Cannot straight perform the apply
In [44]: df.groupby('A').apply(f)
ValueError: cannot reindex from a duplicate axis
By using unique indices everything is ok (so straightforward to fix)
In [45]: df.reset_index().groupby('A').apply(f).set_index('index')
Out[45]:
A B std
index
1 foo 0.971425 1.501271
1 foo -1.151693 1.501271
2 bar 1.265031 1.049376
2 bar -0.219011 1.049376
[4 rows x 3 columns]
Comment From: TomAugspurger
Duplicate of https://github.com/pandas-dev/pandas/issues/19437