related is #6490

I am new to pandas but have run into a problem which I believe is a bug.

I tried to execute the following:

df_sum=pd.pivot_table(df,rows=['Konskund_MEAB','ProdID'],cols=['Year'], aggfunc=lambda x: np.average(x['snittpris'],
weights=x['Antal_forsandelser'])

This produce a number of errors.

The following commands produce the desired results:

rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
g = df.groupby(rows + columns)
s_av = g.apply(lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser']))
df_av = s_av.unstack(cols)

Posted originally on Stackoverflow.

Another more knowledgeable user has verified the problem.


Reproducible example

df = pd.DataFrame([[1] * 5], columns=['Konskund_MEAB','ProdID', 'Year', 'snittpris', 'Antal_forsandelser'])
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
aggfunc = lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser'])

df.pivot_table(index=rows, columns=cols, aggfunc=aggfunc)
# KeyError: 'snittpris'

Comment From: hayd

Thanks for reporting this!

I seem to remember that the issue was something like the aggfunc didn't have access to unused columns, but as I think I recall getting some strange errors about SNDArrays -can't recall what debugging I did exactly to see them...

Simple example, perhaps needs a better one:

df = pd.DataFrame([[1] * 5], columns=['Konskund_MEAB','ProdID', 'Year', 'snittpris', 'Antal_forsandelser'])
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
aggfunc = lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser'])

df.pivot_table(index=rows, columns=cols, aggfunc=aggfunc)
# KeyError: 'snittpris'

Expected result:

In [11]: g = df.groupby(rows + cols).apply(aggfunc).unstack(cols)
Out[11]:
Year                  1
Konskund_MEAB ProdID
1             1       1

(Edit: oop, this may be an agg rather than an apply.)

Comment From: wabee

Cannot figure out how to tag this issue properly so the developers get a proper heads-up

Comment From: hayd

(I wasn't sure if this is bug/enhancement, but tagged as a bug... @wabee reporting here is sufficient, just needs someone to find time to explore a fix :) )

Comment From: jbrockmendel

This is difficult to reproduce without having df available. Can you post a MCVE?

Comment From: MarcoGorelli

I've edited the original post to include the MCVE from hayd (haven't looked into it further though)

Comment From: shteken

I looked a bit into that issue and it seems that the issue originates from the agg function. Therefore, it is not related specifically to pivot tables:

df = pd.DataFrame([[1] * 5], columns=['Konskund_MEAB','ProdID', 'Year', 'snittpris', 'Antal_forsandelser'])
rows = ['Konskund_MEAB','ProdID']
cols = ['Year']
aggfunc = lambda x: np.average(x['snittpris'], weights=x['Antal_forsandelser'])

g = df.groupby(rows + cols)
g.agg(aggfunc)
# KeyError: 'snittpris'