Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
i1 = pd.date_range('2020-10-25 1:00', freq='H', periods= 4, tz='Europe/Berlin')
# DatetimeIndex(['2020-10-25 01:00:00+02:00', '2020-10-25 02:00:00+02:00',
# '2020-10-25 02:00:00+01:00', '2020-10-25 03:00:00+01:00'],
# dtype='datetime64[ns, Europe/Berlin]', freq='H')
ts1 = i1[1]
# Timestamp('2020-10-25 02:00:00+0200', tz='Europe/Berlin', freq='H')
# This error...
i1.floor('H') # AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
# ...can be solved using the `ambiguous` argument if we floor an index:
i1.floor('H', ambiguous='infer')
# But that won't work on the single timestamp:
ts1.floor('H') # AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
ts1.floor('H', ambiguous='infer') # ValueError: Cannot infer offset with only one time.
Issue Description
All timestamps in the above example are well-defined having both a timezone (Europe/Berlin) and a UTC-offset (+2 or +1). Flooring them to the nearest full (quarter)hour should be possible, but raises an error instead.
Expected Behavior
In this case, I don't think the result of either flooring operation is ambiguous. The first should return i1
, and the second should return ts1
.
Installed Versions
Comment From: mroeschke
I think this is the expected behavior as stated in the docs in the Notes section https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html
If the timestamps have a timezone, flooring will take place relative to the local (“wall”) time and re-localized to the same timezone. When flooring near daylight savings time, use nonexistent and ambiguous to control the re-localization behavior.
Comment From: rwijtvliet
Thank you for looking into this.
The nonexistent
does not apply here (we're dealing only with existing timestamps), and ambiguous
is only used for DatetimeIndex
. (I've updated the example above to use Timestamp
as well, as this too is part of my use-case.)
Current (imperfect) workaround for timestamps:
floored_ts1 = ts1.tz_convert('UTC').floor('H').tz_convert(ts1.tz) # NB: incorrect if ts1 has fractional offset (like 03:30 or 03:45) to UTC.
floored_ts1.freq = ts1.freq
Comment From: mroeschke
and ambiguous is only used for DatetimeIndex
This is a DatetimeIndex
In [4]: type(i1)
Out[4]: pandas.core.indexes.datetimes.DatetimeIndex
Since this DatetimeIndex has Timestamps that fall within the DST transition, this is an ambiguous operation since it's equivalent to
In [5]: i1.tz_localize(None).floor("H").tz_localize("Europe/Berlin")
AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
So you'll need to specify one of the ambiguous
options
‘infer’ will attempt to infer fall dst-transition hours based on order
bool-ndarray where True signifies a DST time, False designates a non-DST time (note that this flag is only applicable for ambiguous times)
‘NaT’ will return NaT where there are ambiguous times
Comment From: rwijtvliet
Thanks @mroeschke for your answer.
My question was originally about DatetimeIndex
, but I now recognise the issue is primarily with Timestamp
, which is why I changed the bugreport yesterday afternoon.
Either way, maybe the method below is helpful. Compared to the current implementation, it...
- ...does work for
Timestamp
; - ...does not require the
ambiguous
argument forDatetimeIndex
; and - ...retains the
freq
information.
import pandas as pd
import pytz
def floor2(self, freq):
if self.tz is None:
return self.__class__(self.floor(freq), freq=self.freq)
# All timestamps are converted to the anchor's UTC offset.
anchor = self if isinstance(self, pd.Timestamp) else self[0]
# Turn into fixed offset to eliminate ambiguity...
fo = pytz.FixedOffset(anchor.utcoffset().total_seconds() / 60)
# (fo = 'UTC' gives incorrect result if UTC-offset is not integer number of hours)
# ...then floor and convert back to original timezone...
newinstance = self.tz_convert(fo).floor(freq).tz_convert(self.tz)
# ...and keep original frequency.
return self.__class__(newinstance, freq=self.freq)
pd.DatetimeIndex.floor2 = floor2
pd.Timestamp.floor2 = floor2
# Works for flooring DatetimeIndex to quarterhour
# ------------------------------------------------------
i0 = pd.date_range("2020-10-25 1:11", freq="H", periods=4)
i0.floor2("15T")
#DatetimeIndex(['2020-10-25 01:00:00', '2020-10-25 02:00:00',
# '2020-10-25 03:00:00', '2020-10-25 04:00:00'],
# dtype='datetime64[ns]', freq='H')
i1 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Europe/Berlin")
i1.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+02:00', '2020-10-25 02:00:00+02:00',
# '2020-10-25 02:00:00+01:00', '2020-10-25 03:00:00+01:00'],
# dtype='datetime64[ns, Europe/Berlin]', freq='H')
i2 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Asia/Kolkata")
i2.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+05:30', '2020-10-25 02:00:00+05:30',
# '2020-10-25 03:00:00+05:30', '2020-10-25 04:00:00+05:30'],
# dtype='datetime64[ns, Asia/Kolkata]', freq='H')
# Works for flooring Timestamp to quarterhour
# --------------------------------------------------
i0[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00', freq='H')
i1[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0200', tz='Europe/Berlin', freq='H')
i2[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0530', tz='Asia/Kolkata', freq='H')
It's not finished/perfect yet - e.g. flooring to days is not done correctly. What's needed is a check: if the flooring is to the nearest hour, or shorter, the above code can be used. If it's to a frequency that's longer than an hour, the original code could be used.
But this is quickly turning into a code review. Let me know if you think it's helpful if I turn this into a pull request.
Comment From: lucasjamar
I think this is the expected behavior as stated in the docs in the Notes section https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html
If the timestamps have a timezone, flooring will take place relative to the local (“wall”) time and re-localized to the same timezone. When flooring near daylight savings time, use nonexistent and ambiguous to control the re-localization behavior.
@mroeschke , why is "wall" time selected for flooring the datetime? why are times not converted by default to "UTC", floored and then returned again to set timezone?