related is #6490
I am new to pandas but have run into a problem which I believe is a bug.
I tried to execute the following:
df_sum=pd.pivot_table(df,rows=['Konskund_MEAB','ProdID'],cols=['Year'], aggfunc=lambda x: np.average(x['snittpris'],
weights=x['Antal_forsandelser'])
This produce a number of errors.
The following commands produce the desired results:
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
g = df.groupby(rows + columns)
s_av = g.apply(lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser']))
df_av = s_av.unstack(cols)
Posted originally on Stackoverflow.
Another more knowledgeable user has verified the problem.
Reproducible example
df = pd.DataFrame([[1] * 5], columns=['Konskund_MEAB','ProdID', 'Year', 'snittpris', 'Antal_forsandelser'])
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
aggfunc = lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser'])
df.pivot_table(index=rows, columns=cols, aggfunc=aggfunc)
# KeyError: 'snittpris'
Comment From: hayd
Thanks for reporting this!
I seem to remember that the issue was something like the aggfunc didn't have access to unused columns, but as I think I recall getting some strange errors about SNDArrays -can't recall what debugging I did exactly to see them...
Simple example, perhaps needs a better one:
df = pd.DataFrame([[1] * 5], columns=['Konskund_MEAB','ProdID', 'Year', 'snittpris', 'Antal_forsandelser'])
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
aggfunc = lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser'])
df.pivot_table(index=rows, columns=cols, aggfunc=aggfunc)
# KeyError: 'snittpris'
Expected result:
In [11]: g = df.groupby(rows + cols).apply(aggfunc).unstack(cols)
Out[11]:
Year 1
Konskund_MEAB ProdID
1 1 1
(Edit: oop, this may be an agg rather than an apply.)
Comment From: wabee
Cannot figure out how to tag this issue properly so the developers get a proper heads-up
Comment From: hayd
(I wasn't sure if this is bug/enhancement, but tagged as a bug... @wabee reporting here is sufficient, just needs someone to find time to explore a fix :) )
Comment From: jbrockmendel
This is difficult to reproduce without having df
available. Can you post a MCVE?
Comment From: MarcoGorelli
I've edited the original post to include the MCVE from hayd (haven't looked into it further though)
Comment From: shteken
I looked a bit into that issue and it seems that the issue originates from the agg
function. Therefore, it is not related specifically to pivot tables:
df = pd.DataFrame([[1] * 5], columns=['Konskund_MEAB','ProdID', 'Year', 'snittpris', 'Antal_forsandelser'])
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
aggfunc = lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser'])
g = df.groupby(rows + cols)
g.agg(aggfunc)
# KeyError: 'snittpris'