Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pytz
from zoneinfo import ZoneInfo
from datetime import datetime, timedelta

dts = [datetime(2022, 11, 6, 1, 0) + timedelta(minutes = 15*i) for i in range(4)]*2 + [datetime(2022, 11, 6, 2, 0) + timedelta(minutes = 15*i) for i in range(2)]

s = pd.Series(dts)

zoneinfo_infer = s.dt.tz_localize(ZoneInfo('US/Eastern'), ambiguous = 'infer')
print('Result of `tz_localize` with zoneinfo: (all ambiguous times showing UTC -4:00')
print(zoneinfo_infer)
print('-'*80)

pytz_infer = s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')
print('Result of `tz_localize` with pytz: (correctly showing DST transition')
print(pytz_infer)


# should raise AmbiguousTimeError, but doesn't
s.dt.tz_localize(ZoneInfo('US/Eastern'))

# does raise AmbiguousTimeError
s.dt.tz_localize(pytz.timezone('US/Eastern'))

Issue Description

On 1.5.0rc0

Series.dt.tz_localize using a zoneinfo.Zoneinfo timezone does not correctly recognize a daylight savings time transition

Looks like the cause of this is: * DatetimeArray.tz_localize calls tz_localize_to_utc which instantiates a Localizer class as info * Localizer sets self.use_tzlocal = True because the tz is zoneinfo * info.use_tzlocal makes each value get set using _tz_localize_using_tzinfo_api, bypassing any of the ambiguous time checks * _tz_localize_using_tzinfo_api uses the tzinfo.utcoffset method to convert the value to UTC (via a round trip to np_datetimestruct to a new naive datetime instance * The new naive datetime's fold attr is 0 by default, which then always returns Daylight Time during a transition

Expected Behavior

Usage of zoneinfo should match that of pytz or dateutil

Installed Versions

INSTALLED VERSIONS ------------------ commit : 224458ee25d92ccdf289d1ae2741d178df4f323e python : 3.10.1.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 1.5.0rc0 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 63.4.1 pip : 22.2.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : 1.0.2 psycopg2 : None jinja2 : 3.1.2 IPython : 8.4.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.2 numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 7.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : 2022.1

Comment From: jamesdow21

In a weird wrinkle, localizing to pytz first and then converting to zoneinfo causes the series to display wrong but have the correct actual values

In [60]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')
Out[60]:
0   2022-11-06 01:00:00-04:00
1   2022-11-06 01:15:00-04:00
2   2022-11-06 01:30:00-04:00
3   2022-11-06 01:45:00-04:00
4   2022-11-06 01:00:00-05:00
5   2022-11-06 01:15:00-05:00
6   2022-11-06 01:30:00-05:00
7   2022-11-06 01:45:00-05:00
8   2022-11-06 02:00:00-05:00
9   2022-11-06 02:15:00-05:00
dtype: datetime64[ns, US/Eastern]

In [61]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer')[6]
Out[61]: Timestamp('2022-11-06 01:30:00-0500', tz='US/Eastern')

In [62]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern'))
Out[62]:
0   2022-11-06 01:00:00-04:00
1   2022-11-06 01:15:00-04:00
2   2022-11-06 01:30:00-04:00
3   2022-11-06 01:45:00-04:00
4   2022-11-06 01:00:00-04:00
5   2022-11-06 01:15:00-04:00
6   2022-11-06 01:30:00-04:00
7   2022-11-06 01:45:00-04:00
8   2022-11-06 02:00:00-05:00
9   2022-11-06 02:15:00-05:00
dtype: datetime64[ns, US/Eastern]

In [63]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern'))[6]
Out[63]: Timestamp('2022-11-06 01:30:00-0500', tz='US/Eastern')

In [64]: s.dt.tz_localize(pytz.timezone('US/Eastern'), ambiguous = 'infer').dt.tz_convert(ZoneInfo('US/Eastern')).diff()
Out[64]:
0               NaT
1   0 days 00:15:00
2   0 days 00:15:00
3   0 days 00:15:00
4   0 days 00:15:00
5   0 days 00:15:00
6   0 days 00:15:00
7   0 days 00:15:00
8   0 days 00:15:00
9   0 days 00:15:00
dtype: timedelta64[ns]

But just localizing to zoneinfo causes the ambiguous values issue

In [69]: s.dt.tz_localize(ZoneInfo('US/Eastern')).diff()
Out[69]:
0                 NaT
1     0 days 00:15:00
2     0 days 00:15:00
3     0 days 00:15:00
4   -1 days +23:15:00
5     0 days 00:15:00
6     0 days 00:15:00
7     0 days 00:15:00
8     0 days 01:15:00
9     0 days 00:15:00
dtype: timedelta64[ns]

Comment From: mroeschke

Thanks for the report and diagnosis! Pull requests to correctly set fold in the code path you highlighted would be welcome

cc @jbrockmendel

Comment From: jamesdow21

Haven't had to work with Cython before today, but I think I understand the logic of these sections

I'm going to try my best to see if I can figure out how to get this working

Comment From: jbrockmendel

The relevant logic for ambiguous="infer" lives in pandas._libs.tslibs.tz_conversion._get_dst_hours. ATM it is only implemented for dateutil/pytz (zoneinfo goes through the if info.use_tzlocal: branch of tz_localize_to_utc). It looks like _get_dst_hours relies on result_a and result_b which themselves rely on info.deltas, which isn't exposed by zoneinfo. So fixing this will require figuring out how to accomplish the same thing using available zoneinfo APIs.