Assume we have a simple DF like this

x = pd.DataFrame({'a':[1,1,2,2],'b':['a','a','a','b'],'c':[[1,2],[3,4],[5,6],[7,8]]})

   a  b       c
0  1  a  [1, 2]
1  1  a  [3, 4]
2  2  a  [5, 6]
3  2  b  [7, 8]

and we want to do a groupby operation that concatenates the lists in column c together. This works fine when grouping on one column:

x.groupby('b')['c'].sum()

b
a    [1, 2, 3, 4, 5, 6]
b                [7, 8]
dtype: object

x.groupby('a')['c'].sum()

a
1    [1, 2, 3, 4]
2    [5, 6, 7, 8]
dtype: object

But fails when grouping on multiple columns (i.e. x.groupby(['a','b'])['c'].sum()) with ValueError: Function does not reduce.

This is based on this SO question: https://stackoverflow.com/questions/40224793/pandas-concatenate-list-columns-with-groupby-when-grouping-on-multiple-columns/40226389#40226389

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 17.1.1 Cython: 0.22.1 numpy: 1.11.2 scipy: 0.18.1 statsmodels: 0.6.1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.3.1 patsy: 0.3.0 dateutil: 2.5.3 pytz: 2015.4 blosc: None bottleneck: 1.0.0 tables: 3.2.0 numexpr: 2.6.1 matplotlib: 1.4.3 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: 1.0.0 xlsxwriter: 0.7.3 lxml: 3.4.4 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.5 pymysql: None psycopg2: None jinja2: 2.7.3 boto: 2.33.0 pandas_datareader: None

Comment From: jreback

using lists inside of cells is an anti-pattern.

The reason this is hard to do is that lists are being returned; these are normally sampled then coerced based on the returning dtypes. Since this doesn't reduce that's a hard thing to do. It is possible, but cannot always be guessed.

if you really really want to work with this type of structure, you can do:

In [6]: x.groupby(['a','b']).apply(lambda x: x.sum())
Out[6]: 
     a   b             c
a b                     
1 a  2  aa  [1, 2, 3, 4]
2 a  2   a        [5, 6]
  b  2   b        [7, 8]

This is pretty non-performant though (as is keeping lists inside cells).

So if you want to see if you can this to work in the general case w/o changing any existing behavior, then would accept a PR.

Closing as this is not a bug, more like an unsupported type.