Assume we have a simple DF like this
x = pd.DataFrame({'a':[1,1,2,2],'b':['a','a','a','b'],'c':[[1,2],[3,4],[5,6],[7,8]]})
a b c
0 1 a [1, 2]
1 1 a [3, 4]
2 2 a [5, 6]
3 2 b [7, 8]
and we want to do a groupby operation that concatenates the lists in column c
together. This works fine when grouping on one column:
x.groupby('b')['c'].sum()
b
a [1, 2, 3, 4, 5, 6]
b [7, 8]
dtype: object
x.groupby('a')['c'].sum()
a
1 [1, 2, 3, 4]
2 [5, 6, 7, 8]
dtype: object
But fails when grouping on multiple columns (i.e. x.groupby(['a','b'])['c'].sum()
) with ValueError: Function does not reduce
.
This is based on this SO question: https://stackoverflow.com/questions/40224793/pandas-concatenate-list-columns-with-groupby-when-grouping-on-multiple-columns/40226389#40226389
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.12.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 17.1.1 Cython: 0.22.1 numpy: 1.11.2 scipy: 0.18.1 statsmodels: 0.6.1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.3.1 patsy: 0.3.0 dateutil: 2.5.3 pytz: 2015.4 blosc: None bottleneck: 1.0.0 tables: 3.2.0 numexpr: 2.6.1 matplotlib: 1.4.3 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: 1.0.0 xlsxwriter: 0.7.3 lxml: 3.4.4 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.5 pymysql: None psycopg2: None jinja2: 2.7.3 boto: 2.33.0 pandas_datareader: None
Comment From: jreback
using lists inside of cells is an anti-pattern.
The reason this is hard to do is that lists are being returned; these are normally sampled then coerced based on the returning dtypes. Since this doesn't reduce that's a hard thing to do. It is possible, but cannot always be guessed.
if you really really want to work with this type of structure, you can do:
In [6]: x.groupby(['a','b']).apply(lambda x: x.sum())
Out[6]:
a b c
a b
1 a 2 aa [1, 2, 3, 4]
2 a 2 a [5, 6]
b 2 b [7, 8]
This is pretty non-performant though (as is keeping lists inside cells).
So if you want to see if you can this to work in the general case w/o changing any existing behavior, then would accept a PR.
Closing as this is not a bug, more like an unsupported type.