Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.date_range("2020-3-28", periods=5, freq="D", tz="Europe/London").tz_convert("UTC")

Issue Description

The above returns:

DatetimeIndex(['2020-03-28 00:00:00+00:00', '2020-03-29 00:00:00+00:00',
               '2020-03-29 23:00:00+00:00', '2020-03-30 23:00:00+00:00',
               '2020-03-31 23:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

The frequency is not constant (it is once 23h), and thus should freq not be "D".

Expected Behavior

DatetimeIndex(['2020-03-28 00:00:00+00:00', '2020-03-29 00:00:00+00:00',
               '2020-03-29 23:00:00+00:00', '2020-03-30 23:00:00+00:00',
               '2020-03-31 23:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.10.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Belgium.1252 pandas : 1.5.3 numpy : 1.24.2 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 67.2.0 pip : 23.0 Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.0.8 lxml.etree : 4.9.2 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.4.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : 1.3.6 brotli : None fastparquet : 2023.2.0 fsspec : 2021.11.0 gcsfs : None matplotlib : 3.6.2 numba : None numexpr : 2.8.4 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : 2021.11.0 scipy : 1.10.0 snappy : None sqlalchemy : None tables : None tabulate : 0.8.10 xarray : 2022.12.0 xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: MarcoGorelli

indeed

In [45]: Timestamp('2020-03-29 23:00:00+00:00') - Timestamp('2020-03-29 00:00:00+00:00')
Out[45]: Timedelta('0 days 23:00:00')

Thanks for the report!

Comment From: jbrockmendel

Yah, this is a Hard Problem. The .freq is actually wrong even before the .tz_convert. This is due to a before-my-time "we'll just pretend" in the date_range implementation (grep for "We break Day arithmetic"). xref #41943 and links there, attempted implementation #44364.

Day needs to be changed to mean day-respecting-dst, and users specifically wanting 24 hours should use "24h". We determined a while back that there wasn't a nice way to do this as a deprecation, so it would have to be a breaking change in a major release (it is listed in #44823). I tried last month to do this just under the wire for the RC but found it was not a weekend-sized problem.

I'd love to see this addressed for 3.0, am kind of hoping @mroeschke will make another attempt at implementing. I suspect this would solve many of the extant 'timezones' issues.