Feature request: it would be nice if aggregate
allowed classes as an argument, using its constructor as the function to aggregate by. In the code sample it recognises frozenset
as an iterable because it happens to have such an instance method.
I'd suggest a check like:
>>> inspect.isclass(frozenset)
True
>>> inspect.isclass(frozenset())
False
(I don't think a sane person would make a class with a @staticmethod __iter__
returning the functions to aggregate by)
Code Sample, a copy-pastable example if possible
>>> pd.DataFrame(np.ones((2,2)), columns=[0,1]).groupby(1)[[0]].agg(frozenset)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/groupby.py", line 3597, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/groupby.py", line 3114, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/base.py", line 564, in _aggregate
return self._aggregate_multiple_funcs(arg, _level=_level), None
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/base.py", line 616, in _aggregate_multiple_funcs
return concat(results, keys=keys, axis=1)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/tools/merge.py", line 845, in concat
copy=copy)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/tools/merge.py", line 878, in __init__
raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate
It can be worked around by using a lambda:
pd.DataFrame(np.ones((2,2)), columns=[0,1]).groupby(1)[[0]].agg(lambda x: frozenset(x))
Expected Output
0
1
1.0 (1.0)
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.6.4-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: 1.3.7 pip: 8.1.1 setuptools: 20.6.7 Cython: None numpy: 1.11.1 scipy: 0.18.0 statsmodels: None xarray: None IPython: None sphinx: 1.4.5 patsy: None dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: 1.0.0 tables: None numexpr: 2.4.6 matplotlib: 1.5.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.0b2 pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None
Comment From: jreback
use .apply
; .agg
has some guarantees about inference on the return type
In [5]: pd.DataFrame(np.ones((2,2)), columns=[0,1]).groupby(1)[[0]].apply(frozenset)
Out[5]:
1
1.0 (0)
dtype: object
Comment From: timdiels
Thanks, though note that when selecting multiple columns, one still has to use agg(lambda)
:
>>> df = pd.DataFrame([[1,2,1],[3,4,1],[5,6,2],[7,8,2]])
>>> df['irrelevant'] = [1,2,3,4]
>>> df
0 1 2 irrelevant
0 1 2 1 1
1 3 4 1 2
2 5 6 2 3
3 7 8 2 4
>>> df.groupby(2)[[0,1]].apply(frozenset)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/groupby.py", line 651, in apply
return self._python_apply_general(f)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/groupby.py", line 660, in _python_apply_general
not_indexed_same=mutated or self.mutated)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/groupby.py", line 3375, in _wrap_applied_output
return (Series(values, index=key_index, name=self.name)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/series.py", line 233, in __init__
self.name = name
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/generic.py", line 2694, in __setattr__
object.__setattr__(self, name, value)
File "/home/limyreth/.local/lib/python3.5/site-packages/pandas/core/series.py", line 309, in name
raise TypeError('Series.name must be a hashable type')
TypeError: Series.name must be a hashable type
>>> df.groupby(2)[[0,1]].agg(lambda x: frozenset(x))
0 1
2
1 (1, 3) (2, 4)
2 (5, 7) (8, 6)