From 2021-06-09 dev call, new discussion of https://github.com/pandas-dev/pandas/issues/22864

Problem: Frequency string 'D' and pd.offsets.Day is defined to be a fixed 24 hour period since it's a subclass of Tick.

In the context of timezones with a DST crossing, 'D' acts as a calendar day (23/24/25H) instead for the following operations:

  • pd.date_range(start, end, freq="D")
  • df.resample("D")...

Original Settled Solution: Deprecate all behavior where Frequency string 'D' and pd.offsets.Day is a fixed 24 hours in favor of "24H". A private _Day offset would be used where appropriate internally and swapped out once the deprecation is enforced.

(Note: I lost steam last time catching all the warnings issued in the testing suite given the above solution touches datetimes, timedeltas, offsets, methods, etc)

Other Solutions

  1. Implement a new offset, e.g. "'DayDST'"/pd.offsets.CalendarDay", that users can migrate to.
  2. Deprecate Ticks (Day is a subclass) all together since they are redundant with Timedeltas

cc @pandas-dev/pandas-core

(@jbrockmendel) Updating with checkboxes to keep track of issues I think this would resolve: - [ ] #51716 - [ ] #35388

Comment From: jbrockmendel

Giving this another shot, still have 563 failing tests on the current branch. Some points to figure out and/or decide:

1) ATM multiplication/division with a Day can return a sub-day Tick. What do we do if Day ceases to be a Tick?

2) What do we do when adding a Timedelta-like to a Day?

3) Do we allow adding Day to TimedeltaArray/Index by treating it as 24H? If we do that might make dt64tz + day + tdi behavior ambiguous

4) In Rolling._validate we do freq.nanos which raises if we make Day non-tick. Do we check for Day and convert it to 24H in this case? Or will that cause problems in tzaware cases?

5) ATM Tick comparison considers Day(1) == Hour(24) and similar with other Tick subclasses. If Day is no longer a Tick, then Day(1) != Hour(24) and many existing tests break. Do we want to update the tests or patch __eq__. (For dev purposes ive patched __eq__ calls inside tm, am assuming this isnt a long-term soluton)

6) when passing a freq to TimedeltaIndex, or timedelta_range, I think disallow Day objects but if users pass "D"-like strings we can call that close enough and cast that to 24H.

Will update this as I go.

Comment From: mroeschke

Not sure if https://github.com/pandas-dev/pandas/issues/22864#issuecomment-437199726 provides any answers to the above

Comment From: MarcoGorelli

What's the suggestion for when the result would fall on an ambiguous or non-existent timestamp?

Like

Timestamp('2015-03-28T2:30').tz_localize('Europe/Warsaw') + pd.tseries.offsets.Day(1)

Just raise?

Comment From: jbrockmendel

i think so, yes.