I have been using a "hack" for sometime that has broken in 0.18.0. I have a use case where the aggfunc should return an object (so that I can iterate over the DataFrame and grab properties of the object). I am not interested in a singe numerical value. I might be abusing Pandas a bit here but I have seen others use the same hack.
This code worked prior to 1.18.0:
df.pivot_table('Object', index=['Index1', 'Index2'], columns=['Col1'], aggfunc=lambda x: x)
This now raises the "Function does not reduce":
File "<path>/views.py" in get_context_data
290. pt = df.pivot_table('Object', index=['Index1', 'Index2'], columns=['Col1'], aggfunc=lambda x: x)
File "<path>/lib/python2.7/site-packages/pandas/tools/pivot.py" in pivot_table
121. agged = grouped.agg(aggfunc)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in aggregate
3586. return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in aggregate
3111. return self._python_agg_general(arg, *args, **kwargs)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in _python_agg_general
764. result, counts = self.grouper.agg_series(obj, f)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in agg_series
1863. return self._aggregate_series_pure_python(obj, func)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in _aggregate_series_pure_python
1896. raise ValueError('Function does not reduce')
The root of the issue is obfuscated a bit by the exception handling. The actual issue is that BaseGrouper has no _is_builtin_func method which is called here: https://github.com/pydata/pandas/blob/master/pandas/core/groupby.py#L1865
Typically self is an object which gets it's _is_builtin_func method from SelectionMixin however when aggfunc=lambda x: x self is a BaseGrouper which only inherits from object and does not provide _is_builtin_func.
I have monkey patched _is_builtin_func method (just returns the func passed in) onto BaseGrouper as a temporary fix. Not sure the best way to fix this. Two naive ways of fixing this would be:
1) add _is_builtin_func(self, func): return func
to BaseGrouper
2) check hasattr(self, '_is_builtin_func')
before calling it, if not just use func that was passed in.
These both smell - I am sure there is a better solution.
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 15.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.0 nose: None pip: 8.1.1 setuptools: 20.2.2 Cython: None numpy: 1.11.0 scipy: None statsmodels: None xarray: None IPython: 4.1.2 sphinx: None patsy: None dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: 2.3.4 xlrd: 0.9.4 xlwt: None xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: 0.9999999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: 2.39.0
Comment From: jreback
pls show a copy pastable example that repros.
Comment From: samkuehn
import pandas as pd
class TestObject(object):
prop1 = None
prop2 = None
prop3 = None
prop4 = None
prop5 = None
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
obj_list = [
TestObject(prop1='1', prop2='11', prop3='foo1', prop4='bar1', prop5=1),
TestObject(prop1='1', prop2='22', prop3='foo2', prop4='bar2', prop5=2),
TestObject(prop1='1', prop2='33', prop3='foo3', prop4='bar3', prop5=3),
TestObject(prop1='1', prop2='44', prop3='foo4', prop4='bar4', prop5=3),
TestObject(prop1='1', prop2='55', prop3='foo5', prop4='bar5', prop5=5),
]
df = pd.DataFrame.from_records(
[(obj.prop1,
obj.prop2,
obj.prop3,
obj.prop4,
obj.prop5,
) for obj in obj_list],
columns=(
'prop1',
'prop2',
'prop3',
'prop4',
'prop5',
),
coerce_float=True,
)
pt = df.pivot_table('prop1', index=['prop2', 'prop3'], columns=['prop4'], aggfunc=lambda x: x)
Comment From: jreback
the aggfunc
must be a reducer, so this is correct
furthermore you should coerce types as working with objects is not performant and non-idiomatic
In [17]: df2 = df.apply(pd.to_numeric, errors='ignore')
In [18]: df.dtypes
Out[18]:
prop1 object
prop2 object
prop3 object
prop4 object
prop5 int64
dtype: object
In [19]: df2.dtypes
Out[19]:
prop1 int64
prop2 int64
prop3 object
prop4 object
prop5 int64
dtype: object
In [20]: df2.pivot_table('prop1', index=['prop2', 'prop3'], columns=['prop4'])
Out[20]:
prop4 bar1 bar2 bar3 bar4 bar5
prop2 prop3
11 foo1 1.0 NaN NaN NaN NaN
22 foo2 NaN 1.0 NaN NaN NaN
33 foo3 NaN NaN 1.0 NaN NaN
44 foo4 NaN NaN NaN 1.0 NaN
55 foo5 NaN NaN NaN NaN 1.0
You prob want to simply groupby as that is what pd.pivot_table
is actually doing under the hood.
Then you can iterate if you really want to.
In [25]: for g, grp in df2.groupby(['prop2','prop3']):
....: print g, grp
....:
(11, 'foo1') prop1 prop2 prop3 prop4 prop5
0 1 11 foo1 bar1 1
(22, 'foo2') prop1 prop2 prop3 prop4 prop5
1 1 22 foo2 bar2 2
(33, 'foo3') prop1 prop2 prop3 prop4 prop5
2 1 33 foo3 bar3 3
(44, 'foo4') prop1 prop2 prop3 prop4 prop5
3 1 44 foo4 bar4 3
(55, 'foo5') prop1 prop2 prop3 prop4 prop5
4 1 55 foo5 bar5 5
Comment From: samkuehn
My initial example over simplified. Not sure the suggested solutions will work in my case. I understand that what I am doing here is probably blasphemous but it has been working. This example is more representative of what I am actually trying to do:
import random
import pandas as pd
from pandas.core.groupby import BaseGrouper
def _is_builtin_func(self, func):
return func
# This to make the example work - comment out to break
BaseGrouper._is_builtin_func = _is_builtin_func
class TestObject(object):
prop1 = None
prop2 = None
prop3 = None
prop4 = None
prop5 = None
obj = None
def method1(self):
return 'this is prop1 {}'.format(self.prop1)
def randumb_method(self):
return 'randumb number {}'.format(random.randint(0, 99999999))
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
obj_list = [
TestObject(prop1='1', prop2='11', prop3='foo1', prop4='bar1', prop5=1),
TestObject(prop1='1', prop2='22', prop3='foo2', prop4='bar2', prop5=2),
TestObject(prop1='1', prop2='33', prop3='foo3', prop4='bar3', prop5=3),
TestObject(prop1='1', prop2='44', prop3='foo4', prop4='bar4', prop5=3),
TestObject(prop1='1', prop2='55', prop3='foo5', prop4='bar5', prop5=5),
]
df = pd.DataFrame.from_records(
[(obj.prop1,
obj.prop2,
obj.prop3,
obj.prop4,
obj.prop5,
obj,
) for obj in obj_list],
columns=(
'prop1',
'prop2',
'prop3',
'prop4',
'prop5',
'obj',
),
coerce_float=True,
)
pt = df.pivot_table('obj', index=['prop1', 'prop2'], columns=['prop3'], aggfunc=lambda x: x)
pt.fillna('', inplace=True)
# The below loop is contrived - I am actually building and HTML table in a Django template.
for i in pt.iterrows():
for j in i[1]:
if j:
print(j.prop4)
print(j.prop5)
print(j.method1())
print(j.randumb_method())
I know you will probably look and this and wonder if I have lost my mind. Probably the wrong tool/approach but it was the best I could come up with.
Comment From: jreback
@samkuehn really not sure what you are trying to do. patching internal methods is not a good idea.
Comment From: samkuehn
I am trying to construct an HTML table using a pivot table where the values of the pivot table are objects. Perhaps it would have been more clear if I had written my loop like this:
print('<table>')
for i in pt.iterrows():
print('<tr>')
for j in i[1]:
if j:
# add properties of the TestObject to the table cell
print('<td> prop4:{} - randumb_method: {}</td>'.format(j.prop4, j.randumb_method()))
else:
print('<td>Nothing to see here</td>')
print('</tr>')
print('</table>')
I realize patching internal methods is not a good idea but it was the only way to get it to work after the update to 0.18.0. I know that you said "working with objects is not performant and non-idiomatic" so I am willing to look for a new solution outside of Pandas but wanted to post the issue in case others were running into this after upgrading. I am almost certain that I saw a post using the aggfunc=lambda x: x hack on StackOverflow somewhere so I would assume that others are using it too.
Comment From: TomAugspurger
@samkuehn are you able to generate the HTML in pandas, and pass that to your Django template? I'll take a closer look later.
Comment From: samkuehn
Unfortunately I don't think so. The HTML I need to generate is pretty specific and I need a lot of control. I don't the the Pandas to_html is capable of generating the markup I need.
Comment From: TomAugspurger
DataFrame.style is much more flexible than to_html. I'm guessing it'd be capable (happy to help if you hit issues)
Comment From: samkuehn
It is quite possible that DataFrame.style would allow me to generate the correct markup. The issue is that I need the value of the pivot table to be an object (Django model object) so that I can use its properties in the template.