Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.date_range("2020-3-28", periods=5, freq="D", tz="Europe/London").tz_convert("UTC")
Issue Description
The above returns:
DatetimeIndex(['2020-03-28 00:00:00+00:00', '2020-03-29 00:00:00+00:00',
'2020-03-29 23:00:00+00:00', '2020-03-30 23:00:00+00:00',
'2020-03-31 23:00:00+00:00'],
dtype='datetime64[ns, UTC]', freq='D')
The frequency is not constant (it is once 23h), and thus should freq not be "D".
Expected Behavior
DatetimeIndex(['2020-03-28 00:00:00+00:00', '2020-03-29 00:00:00+00:00',
'2020-03-29 23:00:00+00:00', '2020-03-30 23:00:00+00:00',
'2020-03-31 23:00:00+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
Installed Versions
Comment From: MarcoGorelli
indeed
In [45]: Timestamp('2020-03-29 23:00:00+00:00') - Timestamp('2020-03-29 00:00:00+00:00')
Out[45]: Timedelta('0 days 23:00:00')
Thanks for the report!
Comment From: jbrockmendel
Yah, this is a Hard Problem. The .freq is actually wrong even before the .tz_convert. This is due to a before-my-time "we'll just pretend" in the date_range implementation (grep for "We break Day arithmetic"). xref #41943 and links there, attempted implementation #44364.
Day needs to be changed to mean day-respecting-dst, and users specifically wanting 24 hours should use "24h". We determined a while back that there wasn't a nice way to do this as a deprecation, so it would have to be a breaking change in a major release (it is listed in #44823). I tried last month to do this just under the wire for the RC but found it was not a weekend-sized problem.
I'd love to see this addressed for 3.0, am kind of hoping @mroeschke will make another attempt at implementing. I suspect this would solve many of the extant 'timezones' issues.