Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import pytz
from dateutil.tz import gettz
# CET = pytz.timezone("Europe/Paris")
CET = gettz("Europe/Paris")
start = pd.Timestamp("2021-10-31 01:45", tz=CET)
idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
pd.Series(idx, idx)
# 2021-10-31 01:45:00+02:00 2021-10-31 01:45:00+02:00
# 2021-10-31 02:15:00+02:00 2021-10-31 02:15:00+02:00
# 2021-10-31 02:45:00+02:00 2021-10-31 02:45:00+02:00
# 2021-10-31 02:15:00+01:00 2021-10-31 02:15:00+02:00 --> offset of index changes here
# 2021-10-31 02:45:00+01:00 2021-10-31 02:45:00+02:00
# 2021-10-31 03:15:00+01:00 2021-10-31 03:15:00+01:00 --> offset of values changes here
# 2021-10-31 03:45:00+01:00 2021-10-31 03:45:00+01:00
# Freq: 30T, dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Paris')]
Issue Description
I noticed unexpected duplicated values when manipulating timestamp ranges with the CET timezone of dateutil. There is no issue with the same timezone from pytz.
Expected Behavior
No duplicates in the values of the series here above.
Installed Versions
INSTALLED VERSIONS
------------------
commit : bb1f651536508cdfef8550f93ace7849b00046ee
python : 3.10.2.final.0
python-bits : 64
OS : Darwin
OS-release : 21.3.0
Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.4.0
numpy : 1.21.0
pytz : 2020.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
...
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
Comment From: Anupam-USP
I am able to understand the issue, but how should I proceed?
Comment From: mroeschke
The issue might be in the repr as the underlying values are the same
In [17]: import pandas as pd
...: import pytz
...: from dateutil.tz import gettz
...: # CET = pytz.timezone("Europe/Paris")
...: CET = "Europe/Paris"
...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
...: pd.Series(idx, idx).astype("i8")
Out[17]:
2021-10-31 01:45:00+02:00 1635637500000000000
2021-10-31 02:15:00+02:00 1635639300000000000
2021-10-31 02:45:00+02:00 1635641100000000000
2021-10-31 02:15:00+01:00 1635642900000000000
2021-10-31 02:45:00+01:00 1635644700000000000
2021-10-31 03:15:00+01:00 1635646500000000000
2021-10-31 03:45:00+01:00 1635648300000000000
Freq: 30T, dtype: int64
In [18]: import pandas as pd
...: import pytz
...: from dateutil.tz import gettz
...: # CET = pytz.timezone("Europe/Paris")
...: CET = gettz("Europe/Paris")
...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
...: pd.Series(idx, idx).astype("i8")
Out[18]:
2021-10-31 01:45:00+02:00 1635637500000000000
2021-10-31 02:15:00+02:00 1635639300000000000
2021-10-31 02:45:00+02:00 1635641100000000000
2021-10-31 02:15:00+01:00 1635642900000000000
2021-10-31 02:45:00+01:00 1635644700000000000
2021-10-31 03:15:00+01:00 1635646500000000000
2021-10-31 03:45:00+01:00 1635648300000000000
Freq: 30T, dtype: int64
Comment From: jbrockmendel
fixed by #49684