Code Sample, a copy-pastable example if possible
from datetime import datetime
import pandas as pd
import pytz
tz = pytz.timezone('Asia/Hong_Kong')
datetimes = [datetime(2017, 3, 21, 1, 0, 0, 0, tz), datetime(2017, 3, 21, 2, 0, 0, 0, tz),
datetime(2017, 3, 21, 3, 0, 0, 0, tz), datetime(2017, 3, 21, 4, 0, 0, 0, tz)]
df = pd.DataFrame.from_dict({'groups': ['a', 'a', 'b', 'b'], 'dt': datetimes})
print('Expected output:')
print(df.groupby('groups').aggregate('max'))
print('Observed output:')
print(df.groupby('groups').aggregate({'dt': 'max'}))
Problem description
Expected Output
df.groupby('some_column').aggregate({'some_datetime64_column': 'some_aggregate_function'})
and df.groupby('some_column').aggregate('some_aggregate_function')
should return the same result for some_datetime64_column
of type datetime64
with specified timezones
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.103
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.3.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: 0.7.9.None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.42.0
pandas_datareader: 0.2.1
Comment From: jorisvandenbossche
@AlexandreBeaulne Thanks for the report! This is fixed in master in the meantime (and this will be in the 0.20.0 release):
In [68]: df.groupby('groups').aggregate('max')
Out[68]:
dt
groups
a 2017-03-21 02:23:00+08:00
b 2017-03-21 04:23:00+08:00
In [69]: df.groupby('groups').aggregate({'dt': 'max'})
Out[69]:
dt
groups
a 2017-03-21 02:23:00+08:00
b 2017-03-21 04:23:00+08:00
Comment From: jorisvandenbossche
This was fixed here: https://github.com/pandas-dev/pandas/pull/15433
Comment From: viveshok
@jorisvandenbossche thanks for your quick answer, much appreciated.
I searched for the issue before opening this one but didn't find it.
Thanks again!