From 2021-06-09 dev call, new discussion of https://github.com/pandas-dev/pandas/issues/22864
Problem: Frequency string 'D'
and pd.offsets.Day
is defined to be a fixed 24 hour period since it's a subclass of Tick
.
In the context of timezones with a DST crossing, 'D'
acts as a calendar day (23/24/25H) instead for the following operations:
pd.date_range(start, end, freq="D")
df.resample("D")...
Original Settled Solution: Deprecate all behavior where Frequency string 'D'
and pd.offsets.Day
is a fixed 24 hours in favor of "24H"
. A private _Day
offset would be used where appropriate internally and swapped out once the deprecation is enforced.
(Note: I lost steam last time catching all the warnings issued in the testing suite given the above solution touches datetimes, timedeltas, offsets, methods, etc)
Other Solutions
- Implement a new offset, e.g.
"'DayDST'"/pd.offsets.CalendarDay"
, that users can migrate to. - Deprecate
Tick
s (Day
is a subclass) all together since they are redundant withTimedelta
s
cc @pandas-dev/pandas-core
(@jbrockmendel) Updating with checkboxes to keep track of issues I think this would resolve: - [ ] #51716 - [ ] #35388
Comment From: jbrockmendel
Giving this another shot, still have 563 failing tests on the current branch. Some points to figure out and/or decide:
1) ATM multiplication/division with a Day can return a sub-day Tick. What do we do if Day ceases to be a Tick?
2) What do we do when adding a Timedelta-like to a Day?
3) Do we allow adding Day to TimedeltaArray/Index by treating it as 24H? If we do that might make dt64tz + day + tdi
behavior ambiguous
4) In Rolling._validate we do freq.nanos
which raises if we make Day non-tick. Do we check for Day and convert it to 24H in this case? Or will that cause problems in tzaware cases?
5) ATM Tick comparison considers Day(1) == Hour(24) and similar with other Tick subclasses. If Day is no longer a Tick, then Day(1) != Hour(24) and many existing tests break. Do we want to update the tests or patch __eq__
. (For dev purposes ive patched __eq__
calls inside tm
, am assuming this isnt a long-term soluton)
6) when passing a freq to TimedeltaIndex, or timedelta_range, I think disallow Day objects but if users pass "D"-like strings we can call that close enough and cast that to 24H.
Will update this as I go.
Comment From: mroeschke
Not sure if https://github.com/pandas-dev/pandas/issues/22864#issuecomment-437199726 provides any answers to the above
Comment From: MarcoGorelli
What's the suggestion for when the result would fall on an ambiguous or non-existent timestamp?
Like
Timestamp('2015-03-28T2:30').tz_localize('Europe/Warsaw') + pd.tseries.offsets.Day(1)
Just raise?
Comment From: jbrockmendel
i think so, yes.