Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pytz
from  dateutil.tz import gettz
# CET = pytz.timezone("Europe/Paris") 
CET = gettz("Europe/Paris") 
start = pd.Timestamp("2021-10-31 01:45", tz=CET)
idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
pd.Series(idx, idx)

# 2021-10-31 01:45:00+02:00   2021-10-31 01:45:00+02:00
# 2021-10-31 02:15:00+02:00   2021-10-31 02:15:00+02:00
# 2021-10-31 02:45:00+02:00   2021-10-31 02:45:00+02:00
# 2021-10-31 02:15:00+01:00   2021-10-31 02:15:00+02:00 --> offset of index changes here
# 2021-10-31 02:45:00+01:00   2021-10-31 02:45:00+02:00
# 2021-10-31 03:15:00+01:00   2021-10-31 03:15:00+01:00 --> offset of values changes here
# 2021-10-31 03:45:00+01:00   2021-10-31 03:45:00+01:00
# Freq: 30T, dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Paris')]

Issue Description

I noticed unexpected duplicated values when manipulating timestamp ranges with the CET timezone of dateutil. There is no issue with the same timezone from pytz.

Expected Behavior

No duplicates in the values of the series here above.

Installed Versions

INSTALLED VERSIONS ------------------ commit : bb1f651536508cdfef8550f93ace7849b00046ee python : 3.10.2.final.0 python-bits : 64 OS : Darwin OS-release : 21.3.0 Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8 pandas : 1.4.0 numpy : 1.21.0 pytz : 2020.1 dateutil : 2.8.2 pip : 22.0.4 setuptools : 58.1.0 Cython : None pytest : None hypothesis : None ... xarray : None xlrd : 2.0.1 xlwt : None zstandard : None

Comment From: Anupam-USP

I am able to understand the issue, but how should I proceed?

Comment From: mroeschke

The issue might be in the repr as the underlying values are the same

In [17]: import pandas as pd
    ...: import pytz
    ...: from  dateutil.tz import gettz
    ...: # CET = pytz.timezone("Europe/Paris")
    ...: CET = "Europe/Paris"
    ...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
    ...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
    ...: pd.Series(idx, idx).astype("i8")
Out[17]:
2021-10-31 01:45:00+02:00    1635637500000000000
2021-10-31 02:15:00+02:00    1635639300000000000
2021-10-31 02:45:00+02:00    1635641100000000000
2021-10-31 02:15:00+01:00    1635642900000000000
2021-10-31 02:45:00+01:00    1635644700000000000
2021-10-31 03:15:00+01:00    1635646500000000000
2021-10-31 03:45:00+01:00    1635648300000000000
Freq: 30T, dtype: int64

In [18]: import pandas as pd
    ...: import pytz
    ...: from  dateutil.tz import gettz
    ...: # CET = pytz.timezone("Europe/Paris")
    ...: CET = gettz("Europe/Paris")
    ...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
    ...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
    ...: pd.Series(idx, idx).astype("i8")
Out[18]:
2021-10-31 01:45:00+02:00    1635637500000000000
2021-10-31 02:15:00+02:00    1635639300000000000
2021-10-31 02:45:00+02:00    1635641100000000000
2021-10-31 02:15:00+01:00    1635642900000000000
2021-10-31 02:45:00+01:00    1635644700000000000
2021-10-31 03:15:00+01:00    1635646500000000000
2021-10-31 03:45:00+01:00    1635648300000000000
Freq: 30T, dtype: int64

Comment From: jbrockmendel

fixed by #49684