Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import pytz
from dateutil.tz import gettz
# CET = pytz.timezone("Europe/Paris")
CET = gettz("Europe/Paris")
start = pd.Timestamp("2021-10-31 01:45", tz=CET)
idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
pd.Series(idx, idx)
# 2021-10-31 01:45:00+02:00 2021-10-31 01:45:00+02:00
# 2021-10-31 02:15:00+02:00 2021-10-31 02:15:00+02:00
# 2021-10-31 02:45:00+02:00 2021-10-31 02:45:00+02:00
# 2021-10-31 02:15:00+01:00 2021-10-31 02:15:00+02:00 --> offset of index changes here
# 2021-10-31 02:45:00+01:00 2021-10-31 02:45:00+02:00
# 2021-10-31 03:15:00+01:00 2021-10-31 03:15:00+01:00 --> offset of values changes here
# 2021-10-31 03:45:00+01:00 2021-10-31 03:45:00+01:00
# Freq: 30T, dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Paris')]
Issue Description
I noticed unexpected duplicated values when manipulating timestamp ranges with the CET timezone of dateutil
. There is no issue with the same timezone from pytz
.
Expected Behavior
No duplicates in the values of the series here above.
Installed Versions
INSTALLED VERSIONS
------------------
commit : bb1f651536508cdfef8550f93ace7849b00046ee
python : 3.10.2.final.0
python-bits : 64
OS : Darwin
OS-release : 21.3.0
Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.4.0
numpy : 1.21.0
pytz : 2020.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
...
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
Comment From: Anupam-USP
I am able to understand the issue, but how should I proceed?
Comment From: mroeschke
The issue might be in the repr as the underlying values are the same
In [17]: import pandas as pd
...: import pytz
...: from dateutil.tz import gettz
...: # CET = pytz.timezone("Europe/Paris")
...: CET = "Europe/Paris"
...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
...: pd.Series(idx, idx).astype("i8")
Out[17]:
2021-10-31 01:45:00+02:00 1635637500000000000
2021-10-31 02:15:00+02:00 1635639300000000000
2021-10-31 02:45:00+02:00 1635641100000000000
2021-10-31 02:15:00+01:00 1635642900000000000
2021-10-31 02:45:00+01:00 1635644700000000000
2021-10-31 03:15:00+01:00 1635646500000000000
2021-10-31 03:45:00+01:00 1635648300000000000
Freq: 30T, dtype: int64
In [18]: import pandas as pd
...: import pytz
...: from dateutil.tz import gettz
...: # CET = pytz.timezone("Europe/Paris")
...: CET = gettz("Europe/Paris")
...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
...: pd.Series(idx, idx).astype("i8")
Out[18]:
2021-10-31 01:45:00+02:00 1635637500000000000
2021-10-31 02:15:00+02:00 1635639300000000000
2021-10-31 02:45:00+02:00 1635641100000000000
2021-10-31 02:15:00+01:00 1635642900000000000
2021-10-31 02:45:00+01:00 1635644700000000000
2021-10-31 03:15:00+01:00 1635646500000000000
2021-10-31 03:45:00+01:00 1635648300000000000
Freq: 30T, dtype: int64
Comment From: jbrockmendel
fixed by #49684