I try to mimic this with a datetime dtype column
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame(dict(
date = ['2006-12-01', '2008-12-01', '2011-11-01'],
id = [0, 1, 1],
))
df.date = pd.to_datetime(df.date)
print df.iloc[df.groupby(by = 'id')['date'].idxmin().values.ravel()]
print df.iloc[df.groupby(by = 'id')['date'].agg(pd.Series.idxmin)]
Expected Output
Both results should be the same but the second yield to the following error:
File "/home/benjello/openfisca/junk/pandas_datetime.py", line 21, in <module>
print df.iloc[df.groupby(by = 'id')['date'].agg(pd.Series.idxmin)]
File "/home/benjello/.local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1296, in __getitem__
return self._getitem_axis(key, axis=0)
File "/home/benjello/.local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1599, in _getitem_axis
self._is_valid_list_like(key, axis)
File "/home/benjello/.local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1537, in _is_valid_list_like
if len(arr) and (arr.max() >= l or arr.min() < -l):
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 4.5.0-2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 20.10.1 Cython: 0.24 numpy: 1.11.1 scipy: 0.17.1 statsmodels: None xarray: None IPython: 4.0.3 sphinx: 1.3.6 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: None tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.5.1 openpyxl: 2.3.0 xlrd: 0.9.4 xlwt: 0.7.5 xlsxwriter: 0.7.3 lxml: 3.6.0 bs4: 4.4.1 html5lib: 0.999 httplib2: 0.9.1 apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None
Comment From: jreback
This is a bug in .transform
(which is what .agg
is calling), see #10972
you can just use .apply
In [7]: df.groupby(by = 'id')['date'].apply(pd.Series.idxmin)
Out[7]:
id
0 0
1 1
Name: date, dtype: int64