Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# astype() works as expected for integer input
series = pd.Series([1, 2, 3])
series.astype(pd.SparseDtype("M8[ns]", pd.NaT))
# An AssertionError is raised if input is already M8[ns] (timezone-naive)
series = pd.to_datetime(series)
series.astype(pd.SparseDtype("M8[ns]", pd.NaT)) # ERROR HERE
# if input is timezone-aware, it works and a warning is issued as expected
series.dt.tz_localize("UTC").astype(pd.SparseDtype("M8[ns]", pd.NaT))
# the AssertionError can be avoided by doing astype("O") before sparsifying
series.astype("O").astype(pd.SparseDtype("M8[ns]", pd.NaT))
Issue Description
An unexpected AssertionError is raised when converting a timezone-naive datetime series into a sparse format. This is not the case for equivalent integer offsets, nor does it occur when the timestamps are timezone-aware to begin with. The error can be manually avoided by converting to an object data type before sparsifying.
Here's the traceback for the AssertionError:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/generic.py", line 6240, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 448, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 352, in apply
applied = getattr(b, f)(**kwargs)
File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 218, in astype_array
return astype_dt64_to_dt64tz(values, dtype, copy, via_utc=True)
File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 362, in astype_dt64_to_dt64tz
assert values.tz is None and aware
AssertionError
Expected Behavior
Sparsifying a timezone-naive datetime series should work the same way as if it were timezone-aware or containing equivalent integer offsets. There is no need for a UserWarning about losing timezone information in this case.
Installed Versions
Comment From: jbrockmendel
This works for me on main. An RC should be coming out in a few hours. can you check with that?
Comment From: eerkela
For sure. It might only be broken in the PyPI release. All I did was a pip upgrade before testing.
Comment From: eerkela
Confirmed the issue is resolved on pandas build 2.1.0.dev0+5.g8d2a4e11d1
Looks like it just hasn't been pushed to PyPI yet. I'll close this issue and revisit later if it crops up again.
Here's the output from pd.show_versions()
just to confirm: