Pandas Exception: "Function does not reduce" for pivot_table when aggfunc=lambda x: x

I have been using a "hack" for sometime that has broken in 0.18.0. I have a use case where the aggfunc should return an object (so that I can iterate over the DataFrame and grab properties of the object). I am not interested in a singe numerical value. I might be abusing Pandas a bit here but I have seen others use the same hack.

This code worked prior to 1.18.0: df.pivot_table('Object', index=['Index1', 'Index2'], columns=['Col1'], aggfunc=lambda x: x)

This now raises the "Function does not reduce":

File "<path>/views.py" in get_context_data
  290.             pt = df.pivot_table('Object', index=['Index1', 'Index2'], columns=['Col1'], aggfunc=lambda x: x)
File "<path>/lib/python2.7/site-packages/pandas/tools/pivot.py" in pivot_table
  121.     agged = grouped.agg(aggfunc)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in aggregate
  3586.         return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in aggregate
  3111.                 return self._python_agg_general(arg, *args, **kwargs)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in _python_agg_general
  764.                 result, counts = self.grouper.agg_series(obj, f)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in agg_series
  1863.             return self._aggregate_series_pure_python(obj, func)
File "<path>/lib/python2.7/site-packages/pandas/core/groupby.py" in _aggregate_series_pure_python
  1896.                     raise ValueError('Function does not reduce')

The root of the issue is obfuscated a bit by the exception handling. The actual issue is that BaseGrouper has no _is_builtin_func method which is called here: https://github.com/pydata/pandas/blob/master/pandas/core/groupby.py#L1865

Typically self is an object which gets it's _is_builtin_func method from SelectionMixin however when aggfunc=lambda x: x self is a BaseGrouper which only inherits from object and does not provide _is_builtin_func.

I have monkey patched _is_builtin_func method (just returns the func passed in) onto BaseGrouper as a temporary fix. Not sure the best way to fix this. Two naive ways of fixing this would be: 1) add _is_builtin_func(self, func): return func to BaseGrouper 2) check hasattr(self, '_is_builtin_func') before calling it, if not just use func that was passed in.

These both smell - I am sure there is a better solution.

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 15.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.0 nose: None pip: 8.1.1 setuptools: 20.2.2 Cython: None numpy: 1.11.0 scipy: None statsmodels: None xarray: None IPython: 4.1.2 sphinx: None patsy: None dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: 2.3.4 xlrd: 0.9.4 xlwt: None xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: 0.9999999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: 2.39.0

Comment From: jreback

pls show a copy pastable example that repros.

Comment From: samkuehn

import pandas as pd


class TestObject(object):
    prop1 = None
    prop2 = None
    prop3 = None
    prop4 = None
    prop5 = None

    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

obj_list = [
    TestObject(prop1='1', prop2='11', prop3='foo1', prop4='bar1', prop5=1),
    TestObject(prop1='1', prop2='22', prop3='foo2', prop4='bar2', prop5=2),
    TestObject(prop1='1', prop2='33', prop3='foo3', prop4='bar3', prop5=3),
    TestObject(prop1='1', prop2='44', prop3='foo4', prop4='bar4', prop5=3),
    TestObject(prop1='1', prop2='55', prop3='foo5', prop4='bar5', prop5=5),
]
df = pd.DataFrame.from_records(
    [(obj.prop1,
      obj.prop2,
      obj.prop3,
      obj.prop4,
      obj.prop5,
      ) for obj in obj_list],
    columns=(
        'prop1',
        'prop2',
        'prop3',
        'prop4',
        'prop5',
    ),
    coerce_float=True,
)
pt = df.pivot_table('prop1', index=['prop2', 'prop3'], columns=['prop4'], aggfunc=lambda x: x)

Comment From: jreback

the aggfunc must be a reducer, so this is correct

furthermore you should coerce types as working with objects is not performant and non-idiomatic

In [17]: df2 = df.apply(pd.to_numeric, errors='ignore')

In [18]: df.dtypes
Out[18]: 
prop1    object
prop2    object
prop3    object
prop4    object
prop5     int64
dtype: object

In [19]: df2.dtypes
Out[19]: 
prop1     int64
prop2     int64
prop3    object
prop4    object
prop5     int64
dtype: object

In [20]: df2.pivot_table('prop1', index=['prop2', 'prop3'], columns=['prop4'])
Out[20]: 
prop4        bar1  bar2  bar3  bar4  bar5
prop2 prop3                              
11    foo1    1.0   NaN   NaN   NaN   NaN
22    foo2    NaN   1.0   NaN   NaN   NaN
33    foo3    NaN   NaN   1.0   NaN   NaN
44    foo4    NaN   NaN   NaN   1.0   NaN
55    foo5    NaN   NaN   NaN   NaN   1.0

You prob want to simply groupby as that is what pd.pivot_table is actually doing under the hood. Then you can iterate if you really want to.

In [25]: for g, grp in df2.groupby(['prop2','prop3']):
   ....:     print g, grp
   ....:     
(11, 'foo1')    prop1  prop2 prop3 prop4  prop5
0      1     11  foo1  bar1      1
(22, 'foo2')    prop1  prop2 prop3 prop4  prop5
1      1     22  foo2  bar2      2
(33, 'foo3')    prop1  prop2 prop3 prop4  prop5
2      1     33  foo3  bar3      3
(44, 'foo4')    prop1  prop2 prop3 prop4  prop5
3      1     44  foo4  bar4      3
(55, 'foo5')    prop1  prop2 prop3 prop4  prop5
4      1     55  foo5  bar5      5

Comment From: samkuehn

My initial example over simplified. Not sure the suggested solutions will work in my case. I understand that what I am doing here is probably blasphemous but it has been working. This example is more representative of what I am actually trying to do:

import random

import pandas as pd
from pandas.core.groupby import BaseGrouper


def _is_builtin_func(self, func):
    return func


# This to make the example work - comment out to break
BaseGrouper._is_builtin_func = _is_builtin_func


class TestObject(object):
    prop1 = None
    prop2 = None
    prop3 = None
    prop4 = None
    prop5 = None
    obj = None

    def method1(self):
        return 'this is prop1 {}'.format(self.prop1)

    def randumb_method(self):
        return 'randumb number {}'.format(random.randint(0, 99999999))

    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)


obj_list = [
    TestObject(prop1='1', prop2='11', prop3='foo1', prop4='bar1', prop5=1),
    TestObject(prop1='1', prop2='22', prop3='foo2', prop4='bar2', prop5=2),
    TestObject(prop1='1', prop2='33', prop3='foo3', prop4='bar3', prop5=3),
    TestObject(prop1='1', prop2='44', prop3='foo4', prop4='bar4', prop5=3),
    TestObject(prop1='1', prop2='55', prop3='foo5', prop4='bar5', prop5=5),
]
df = pd.DataFrame.from_records(
    [(obj.prop1,
      obj.prop2,
      obj.prop3,
      obj.prop4,
      obj.prop5,
      obj,
      ) for obj in obj_list],
    columns=(
        'prop1',
        'prop2',
        'prop3',
        'prop4',
        'prop5',
        'obj',
    ),
    coerce_float=True,
)
pt = df.pivot_table('obj', index=['prop1', 'prop2'], columns=['prop3'], aggfunc=lambda x: x)
pt.fillna('', inplace=True)

# The below loop is contrived - I am actually building and HTML table in a Django template.
for i in pt.iterrows():
    for j in i[1]:
        if j:
            print(j.prop4)
            print(j.prop5)
            print(j.method1())
            print(j.randumb_method())

I know you will probably look and this and wonder if I have lost my mind. Probably the wrong tool/approach but it was the best I could come up with.

Comment From: jreback

@samkuehn really not sure what you are trying to do. patching internal methods is not a good idea.

Comment From: samkuehn

I am trying to construct an HTML table using a pivot table where the values of the pivot table are objects. Perhaps it would have been more clear if I had written my loop like this:

print('<table>')
for i in pt.iterrows():
    print('<tr>')
    for j in i[1]:
        if j:
            # add properties of the TestObject to the table cell
            print('<td> prop4:{} - randumb_method: {}</td>'.format(j.prop4, j.randumb_method())) 
        else:
            print('<td>Nothing to see here</td>')
    print('</tr>')
print('</table>')

I realize patching internal methods is not a good idea but it was the only way to get it to work after the update to 0.18.0. I know that you said "working with objects is not performant and non-idiomatic" so I am willing to look for a new solution outside of Pandas but wanted to post the issue in case others were running into this after upgrading. I am almost certain that I saw a post using the aggfunc=lambda x: x hack on StackOverflow somewhere so I would assume that others are using it too.

Comment From: TomAugspurger

@samkuehn are you able to generate the HTML in pandas, and pass that to your Django template? I'll take a closer look later.

Comment From: samkuehn

Unfortunately I don't think so. The HTML I need to generate is pretty specific and I need a lot of control. I don't the the Pandas to_html is capable of generating the markup I need.

Comment From: TomAugspurger

DataFrame.style is much more flexible than to_html. I'm guessing it'd be capable (happy to help if you hit issues)

Comment From: samkuehn

It is quite possible that DataFrame.style would allow me to generate the correct markup. The issue is that I need the value of the pivot table to be an object (Django model object) so that I can use its properties in the template.

Pandas Exception: "Function does not reduce" for pivot_table when aggfunc=lambda x: x

output of pd.show_versions()

INSTALLED VERSIONS

output of `pd.show_versions()`