Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import pytz
from zoneinfo import ZoneInfo
from datetime import datetime, timedelta
dts = [datetime(2022, 11, 6, 1, 0) + timedelta(minutes = 15*i) for i in range(4)]*2 + [datetime(2022, 11, 6, 2, 0) + timedelta(minutes = 15*i) for i in range(2)]
s = pd.Series(dts)
zoneinfo_infer = s.dt.tz_localize(ZoneInfo('US/Eastern'), ambiguous = 'infer')
print('Result of `tz_localize` with zoneinfo: (all ambiguous times showing UTC -4:00')
print(zoneinfo_infer)
print('-'*80)
pytz_infer = s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')
print('Result of `tz_localize` with pytz: (correctly showing DST transition')
print(pytz_infer)
# should raise AmbiguousTimeError, but doesn't
s.dt.tz_localize(ZoneInfo('US/Eastern'))
# does raise AmbiguousTimeError
s.dt.tz_localize(pytz.timezone('US/Eastern'))
Issue Description
On 1.5.0rc0
Series.dt.tz_localize using a zoneinfo.Zoneinfo timezone does not correctly recognize a daylight savings time transition
Looks like the cause of this is:
* DatetimeArray.tz_localize
calls tz_localize_to_utc which instantiates a Localizer
class as info
* Localizer
sets self.use_tzlocal = True
because the tz is zoneinfo
* info.use_tzlocal
makes each value get set using _tz_localize_using_tzinfo_api, bypassing any of the ambiguous time checks
* _tz_localize_using_tzinfo_api uses the tzinfo.utcoffset method to convert the value to UTC (via a round trip to np_datetimestruct to a new naive datetime instance
* The new naive datetime's fold attr is 0 by default, which then always returns Daylight Time during a transition
Expected Behavior
Usage of zoneinfo should match that of pytz or dateutil
Installed Versions
Comment From: jamesdow21
In a weird wrinkle, localizing to pytz first and then converting to zoneinfo causes the series to display wrong but have the correct actual values
In [60]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')
Out[60]:
0 2022-11-06 01:00:00-04:00
1 2022-11-06 01:15:00-04:00
2 2022-11-06 01:30:00-04:00
3 2022-11-06 01:45:00-04:00
4 2022-11-06 01:00:00-05:00
5 2022-11-06 01:15:00-05:00
6 2022-11-06 01:30:00-05:00
7 2022-11-06 01:45:00-05:00
8 2022-11-06 02:00:00-05:00
9 2022-11-06 02:15:00-05:00
dtype: datetime64[ns, US/Eastern]
In [61]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')[6]
Out[61]: Timestamp('2022-11-06 01:30:00-0500', tz='US/Eastern')
In [62]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern'))
Out[62]:
0 2022-11-06 01:00:00-04:00
1 2022-11-06 01:15:00-04:00
2 2022-11-06 01:30:00-04:00
3 2022-11-06 01:45:00-04:00
4 2022-11-06 01:00:00-04:00
5 2022-11-06 01:15:00-04:00
6 2022-11-06 01:30:00-04:00
7 2022-11-06 01:45:00-04:00
8 2022-11-06 02:00:00-05:00
9 2022-11-06 02:15:00-05:00
dtype: datetime64[ns, US/Eastern]
In [63]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern'))[6]
Out[63]: Timestamp('2022-11-06 01:30:00-0500', tz='US/Eastern')
In [64]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern')).diff()
Out[64]:
0 NaT
1 0 days 00:15:00
2 0 days 00:15:00
3 0 days 00:15:00
4 0 days 00:15:00
5 0 days 00:15:00
6 0 days 00:15:00
7 0 days 00:15:00
8 0 days 00:15:00
9 0 days 00:15:00
dtype: timedelta64[ns]
But just localizing to zoneinfo causes the ambiguous values issue
In [69]: s.dt.tz_localize(ZoneInfo('US/Eastern')).diff()
Out[69]:
0 NaT
1 0 days 00:15:00
2 0 days 00:15:00
3 0 days 00:15:00
4 -1 days +23:15:00
5 0 days 00:15:00
6 0 days 00:15:00
7 0 days 00:15:00
8 0 days 01:15:00
9 0 days 00:15:00
dtype: timedelta64[ns]
Comment From: mroeschke
Thanks for the report and diagnosis! Pull requests to correctly set fold in the code path you highlighted would be welcome
cc @jbrockmendel
Comment From: jamesdow21
Haven't had to work with Cython before today, but I think I understand the logic of these sections
I'm going to try my best to see if I can figure out how to get this working
Comment From: jbrockmendel
The relevant logic for ambiguous="infer"
lives in pandas._libs.tslibs.tz_conversion._get_dst_hours. ATM it is only implemented for dateutil/pytz (zoneinfo goes through the if info.use_tzlocal:
branch of tz_localize_to_utc
). It looks like _get_dst_hours relies on result_a and result_b which themselves rely on info.deltas
, which isn't exposed by zoneinfo. So fixing this will require figuring out how to accomplish the same thing using available zoneinfo APIs.