-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame( {'timestamp': ['2021-03-28T00:37:28.952+01:00','2021-03-28T04:09:36.106+02:00'],'data': [1, 2]})
df = df.set_index('timestamp')
df.index = pd.to_datetime(df.index)
df.resample('H').size()
Issue Description
pandas.DataFrame.resample()
fails when index crosses a daylight savings boundary. Error as below
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Expected Behavior
timestamp
2021-03-28 00:00:00+01:00 1
2021-03-28 01:00:00+01:00 0
2021-03-28 02:00:00+01:00 0
2021-03-28 03:00:00+02:00 0
2021-03-28 03:00:00+02:00 0
2021-03-28 04:00:00+02:00 1
Freq: H, dtype: int64
Installed Versions
Comment From: joooeey
If you're okay with displaying your times in UTC instead of local time, you can do pd.to_datetime(df.index, utc=True)
.
Comment From: MarcoGorelli
thanks for your issue
the issue is that your index isn't a datetimeindex, as it contains mixed timezones (which isn't allowed, and in fact, IMO, should be deprecated - this issue is a perfect example of how misleading it is https://github.com/pandas-dev/pandas/issues/50887)
you should instead pass utc=True
and then convert to your time zone - then, DST will be handled fine
In [33]: import pandas as pd
...: df = pd.DataFrame( {'timestamp': ['2021-03-28T00:37:28.952+01:00','2021-03-28T04:09:36.106+02:00'],'data': [1,
...: 2]})
...: df = df.set_index('timestamp')
...: df.index = pd.to_datetime(df.index, utc=True).tz_convert('Europe/Brussels')
In [34]: df.resample('H').size()
Out[34]:
timestamp
2021-03-28 00:00:00+01:00 1
2021-03-28 01:00:00+01:00 0
2021-03-28 03:00:00+02:00 0
2021-03-28 04:00:00+02:00 1
Freq: H, dtype: int64
closing then