Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
timestr = '2023-11-05T08:30:00Z'
ts = pd.Timestamp.fromisoformat(timestr)
assert ts == pd.to_datetime(timestr)

timestr = '2023-11-05T08:30:00+0000'
ts = pd.Timestamp.fromisoformat(timestr)
assert ts == pd.to_datetime(timestr)

ts = pd.Timestamp.utcnow()
assert ts == pd.Timestamp.fromisoformat(ts.isoformat())

Issue Description

Timestamp.fromisoformat seems to ignore timezone offsets in the ISO 8601 formatted string, and returns a timezone-naive Timestamp. This issue does not seem to be present for pd.to_datetime, which does return the correct timezone-aware Timestamp. By consequence, the roundtrip conversion pd.Timestamp.fromisoformat(ts.isoformat()) loses the timezone information.

Expected Behavior

I would expect the Timestamp returned by fromisoformat to be time-zone aware, if the ISO string has UTC offset information. This is the behavior of the datetime.datetime objects from the standard library.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 3d492f175077788df89b6fc82608c201164d81e2 python : 3.11.6.final.0 python-bits : 64 OS : Linux OS-release : 6.6.3-arch1-1 Version : #1 SMP PREEMPT_DYNAMIC Wed, 29 Nov 2023 00:37:40 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.utf8 LOCALE : en_US.UTF-8 pandas : 2.2.0.dev0+831.g3d492f1750 numpy : 1.26.2 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 65.5.0 pip : 23.2.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.17.2 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

Comment From: nekkomira

Hi, I tried addressing the bug but it seemed a bit out of scope and complex for me. It seems that ther is an issue with fromisoformat to be dropping the timezone information as you indicated. I added a test case using your example when addressing the bug as a benchmark for any future fixes, ensuring that the corrected implementation handles timezone data as expected. The PR is listed above but here's the link as well. https://github.com/pandas-dev/pandas/pull/56477

Comment From: lithomas1

Hi, I can reproduce this bug on main.

fromisoformat doesn't appear to be a method that we override itself (it is from the stdlib), but that calls back into the Timestamp constructor, so that's where I'd look to see if something has gone wrong.

(This file would be a Cython file pandas/_libs/tslibs/timestamps.pyx

Comment From: bionicles

this is a "wtf" issue, we need proper timezone support in pandas, losing offsets is not an option, this is extremely serious because it can cause timestamps to be off by an entire DAY in production for many apps right now (!)

Comment From: bionicles

reproduction

import datetime

import pandas as pd
import pytest


utc_example = "2024-08-02T00:00:00-00:00"
edt_example = "2024-08-02T00:00:00-04:00"
zulu_example = "2024-08-02T00:00:00Z"


@pytest.mark.parametrize("iso8601_str", [zulu_example, utc_example, edt_example])
def test_pandas_does_not_ignore_timezone(iso8601_str: str):
    pd_timestamp = pd.Timestamp.fromisoformat(iso8601_str)
    print("PANDAS pd.Timestamp:")
    print(f"{iso8601_str} -> {pd_timestamp=}")
    tz = pd_timestamp.tz
    print(f"{tz=}")
    assert tz is not None, "pandas pd.Timestamp.fromisoformat lost timezone"
    assert 0, "pandas pd.Timestamp.fromisoformat OK"


@pytest.mark.parametrize("iso8601_str", [zulu_example, utc_example, edt_example])
def test_python_datetime_does_not_ignore_timezone(iso8601_str: str):
    py_datetime = datetime.datetime.fromisoformat(iso8601_str)
    print("python stdlib datetime.datetime:")
    print(f"{iso8601_str} -> {py_datetime}")
    tzinfo = py_datetime.tzinfo
    print(f"{tzinfo=}")
    assert tzinfo is not None, "python datetime.datetime.fromisoformat lost timezone"
    assert 0, "python datetime.datetime.fromisoformat OK"

result Pandas BUG: Timestamp.fromisoformat drops ISO 8601 timezone information yes, python gets it right and pandas gets it wrong Pandas BUG: Timestamp.fromisoformat drops ISO 8601 timezone information

Comment From: bionicles

yo, workaround:

pd.to_datetime works, but converts the "America/New_York" to "UTC-4:00" and fails string equality checks on the timezone :(

however, i can't find the code for pd.Timestamp.fromisoformat in order to look at it. anyone know where that is?

Comment From: j-hendricks

take