Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
timestr = '2023-11-05T08:30:00Z'
ts = pd.Timestamp.fromisoformat(timestr)
assert ts == pd.to_datetime(timestr)
timestr = '2023-11-05T08:30:00+0000'
ts = pd.Timestamp.fromisoformat(timestr)
assert ts == pd.to_datetime(timestr)
ts = pd.Timestamp.utcnow()
assert ts == pd.Timestamp.fromisoformat(ts.isoformat())
Issue Description
Timestamp.fromisoformat seems to ignore timezone offsets in the ISO 8601 formatted string, and returns a timezone-naive Timestamp. This issue does not seem to be present for pd.to_datetime, which does return the correct timezone-aware Timestamp. By consequence, the roundtrip conversion pd.Timestamp.fromisoformat(ts.isoformat()) loses the timezone information.
Expected Behavior
I would expect the Timestamp returned by fromisoformat to be time-zone aware, if the ISO string has UTC offset information. This is the behavior of the datetime.datetime objects from the standard library.
Installed Versions
Comment From: nekkomira
Hi, I tried addressing the bug but it seemed a bit out of scope and complex for me. It seems that ther is an issue with fromisoformat to be dropping the timezone information as you indicated. I added a test case using your example when addressing the bug as a benchmark for any future fixes, ensuring that the corrected implementation handles timezone data as expected. The PR is listed above but here's the link as well. https://github.com/pandas-dev/pandas/pull/56477
Comment From: lithomas1
Hi, I can reproduce this bug on main.
fromisoformat
doesn't appear to be a method that we override itself (it is from the stdlib), but that calls back into the Timestamp constructor, so that's where I'd look to see if something has gone wrong.
(This file would be a Cython file pandas/_libs/tslibs/timestamps.pyx
Comment From: bionicles
this is a "wtf" issue, we need proper timezone support in pandas, losing offsets is not an option, this is extremely serious because it can cause timestamps to be off by an entire DAY in production for many apps right now (!)
Comment From: bionicles
reproduction
import datetime
import pandas as pd
import pytest
utc_example = "2024-08-02T00:00:00-00:00"
edt_example = "2024-08-02T00:00:00-04:00"
zulu_example = "2024-08-02T00:00:00Z"
@pytest.mark.parametrize("iso8601_str", [zulu_example, utc_example, edt_example])
def test_pandas_does_not_ignore_timezone(iso8601_str: str):
pd_timestamp = pd.Timestamp.fromisoformat(iso8601_str)
print("PANDAS pd.Timestamp:")
print(f"{iso8601_str} -> {pd_timestamp=}")
tz = pd_timestamp.tz
print(f"{tz=}")
assert tz is not None, "pandas pd.Timestamp.fromisoformat lost timezone"
assert 0, "pandas pd.Timestamp.fromisoformat OK"
@pytest.mark.parametrize("iso8601_str", [zulu_example, utc_example, edt_example])
def test_python_datetime_does_not_ignore_timezone(iso8601_str: str):
py_datetime = datetime.datetime.fromisoformat(iso8601_str)
print("python stdlib datetime.datetime:")
print(f"{iso8601_str} -> {py_datetime}")
tzinfo = py_datetime.tzinfo
print(f"{tzinfo=}")
assert tzinfo is not None, "python datetime.datetime.fromisoformat lost timezone"
assert 0, "python datetime.datetime.fromisoformat OK"
result
yes, python gets it right and pandas gets it wrong
Comment From: bionicles
yo, workaround:
pd.to_datetime
works, but converts the "America/New_York" to "UTC-4:00" and fails string equality checks on the timezone :(
however, i can't find the code for pd.Timestamp.fromisoformat
in order to look at it. anyone know where that is?
Comment From: j-hendricks
take