Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
i1 = pd.date_range('2020-10-25 1:00', freq='H', periods= 4, tz='Europe/Berlin')
# DatetimeIndex(['2020-10-25 01:00:00+02:00', '2020-10-25 02:00:00+02:00',
# '2020-10-25 02:00:00+01:00', '2020-10-25 03:00:00+01:00'],
# dtype='datetime64[ns, Europe/Berlin]', freq='H')
ts1 = i1[1]
# Timestamp('2020-10-25 02:00:00+0200', tz='Europe/Berlin', freq='H')
# This error...
i1.floor('H') # AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
# ...can be solved using the `ambiguous` argument if we floor an index:
i1.floor('H', ambiguous='infer')
# But that won't work on the single timestamp:
ts1.floor('H') # AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
ts1.floor('H', ambiguous='infer') # ValueError: Cannot infer offset with only one time.
Issue Description
All timestamps in the above example are well-defined having both a timezone (Europe/Berlin) and a UTC-offset (+2 or +1). Flooring them to the nearest full (quarter)hour should be possible, but raises an error instead.
Expected Behavior
In this case, I don't think the result of either flooring operation is ambiguous. The first should return i1, and the second should return ts1.
Installed Versions
Comment From: mroeschke
I think this is the expected behavior as stated in the docs in the Notes section https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html
If the timestamps have a timezone, flooring will take place relative to the local (“wall”) time and re-localized to the same timezone. When flooring near daylight savings time, use nonexistent and ambiguous to control the re-localization behavior.
Comment From: rwijtvliet
Thank you for looking into this.
The nonexistent does not apply here (we're dealing only with existing timestamps), and ambiguous is only used for DatetimeIndex. (I've updated the example above to use Timestamp as well, as this too is part of my use-case.)
Current (imperfect) workaround for timestamps:
floored_ts1 = ts1.tz_convert('UTC').floor('H').tz_convert(ts1.tz) # NB: incorrect if ts1 has fractional offset (like 03:30 or 03:45) to UTC.
floored_ts1.freq = ts1.freq
Comment From: mroeschke
and ambiguous is only used for DatetimeIndex
This is a DatetimeIndex
In [4]: type(i1)
Out[4]: pandas.core.indexes.datetimes.DatetimeIndex
Since this DatetimeIndex has Timestamps that fall within the DST transition, this is an ambiguous operation since it's equivalent to
In [5]: i1.tz_localize(None).floor("H").tz_localize("Europe/Berlin")
AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
So you'll need to specify one of the ambiguous options
‘infer’ will attempt to infer fall dst-transition hours based on order
bool-ndarray where True signifies a DST time, False designates a non-DST time (note that this flag is only applicable for ambiguous times)
‘NaT’ will return NaT where there are ambiguous times
Comment From: rwijtvliet
Thanks @mroeschke for your answer.
My question was originally about DatetimeIndex, but I now recognise the issue is primarily with Timestamp, which is why I changed the bugreport yesterday afternoon.
Either way, maybe the method below is helpful. Compared to the current implementation, it...
- ...does work for
Timestamp; - ...does not require the
ambiguousargument forDatetimeIndex; and - ...retains the
freqinformation.
import pandas as pd
import pytz
def floor2(self, freq):
if self.tz is None:
return self.__class__(self.floor(freq), freq=self.freq)
# All timestamps are converted to the anchor's UTC offset.
anchor = self if isinstance(self, pd.Timestamp) else self[0]
# Turn into fixed offset to eliminate ambiguity...
fo = pytz.FixedOffset(anchor.utcoffset().total_seconds() / 60)
# (fo = 'UTC' gives incorrect result if UTC-offset is not integer number of hours)
# ...then floor and convert back to original timezone...
newinstance = self.tz_convert(fo).floor(freq).tz_convert(self.tz)
# ...and keep original frequency.
return self.__class__(newinstance, freq=self.freq)
pd.DatetimeIndex.floor2 = floor2
pd.Timestamp.floor2 = floor2
# Works for flooring DatetimeIndex to quarterhour
# ------------------------------------------------------
i0 = pd.date_range("2020-10-25 1:11", freq="H", periods=4)
i0.floor2("15T")
#DatetimeIndex(['2020-10-25 01:00:00', '2020-10-25 02:00:00',
# '2020-10-25 03:00:00', '2020-10-25 04:00:00'],
# dtype='datetime64[ns]', freq='H')
i1 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Europe/Berlin")
i1.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+02:00', '2020-10-25 02:00:00+02:00',
# '2020-10-25 02:00:00+01:00', '2020-10-25 03:00:00+01:00'],
# dtype='datetime64[ns, Europe/Berlin]', freq='H')
i2 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Asia/Kolkata")
i2.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+05:30', '2020-10-25 02:00:00+05:30',
# '2020-10-25 03:00:00+05:30', '2020-10-25 04:00:00+05:30'],
# dtype='datetime64[ns, Asia/Kolkata]', freq='H')
# Works for flooring Timestamp to quarterhour
# --------------------------------------------------
i0[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00', freq='H')
i1[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0200', tz='Europe/Berlin', freq='H')
i2[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0530', tz='Asia/Kolkata', freq='H')
It's not finished/perfect yet - e.g. flooring to days is not done correctly. What's needed is a check: if the flooring is to the nearest hour, or shorter, the above code can be used. If it's to a frequency that's longer than an hour, the original code could be used.
But this is quickly turning into a code review. Let me know if you think it's helpful if I turn this into a pull request.
Comment From: lucasjamar
I think this is the expected behavior as stated in the docs in the Notes section https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html
If the timestamps have a timezone, flooring will take place relative to the local (“wall”) time and re-localized to the same timezone. When flooring near daylight savings time, use nonexistent and ambiguous to control the re-localization behavior.
@mroeschke , why is "wall" time selected for flooring the datetime? why are times not converted by default to "UTC", floored and then returned again to set timezone?