Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# astype() works as expected for integer input
series = pd.Series([1, 2, 3])
series.astype(pd.SparseDtype("M8[ns]", pd.NaT))

# An AssertionError is raised if input is already M8[ns] (timezone-naive)
series = pd.to_datetime(series)
series.astype(pd.SparseDtype("M8[ns]", pd.NaT))  # ERROR HERE

# if input is timezone-aware, it works and a warning is issued as expected
series.dt.tz_localize("UTC").astype(pd.SparseDtype("M8[ns]", pd.NaT))

# the AssertionError can be avoided by doing astype("O") before sparsifying
series.astype("O").astype(pd.SparseDtype("M8[ns]", pd.NaT))

Issue Description

An unexpected AssertionError is raised when converting a timezone-naive datetime series into a sparse format. This is not the case for equivalent integer offsets, nor does it occur when the timestamps are timezone-aware to begin with. The error can be manually avoided by converting to an object data type before sparsifying.

Here's the traceback for the AssertionError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/generic.py", line 6240, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 448, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 352, in apply
    applied = getattr(b, f)(**kwargs)
  File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 526, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 218, in astype_array
    return astype_dt64_to_dt64tz(values, dtype, copy, via_utc=True)
  File "/mnt/c/data/pdtypes/venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 362, in astype_dt64_to_dt64tz
    assert values.tz is None and aware
AssertionError

Expected Behavior

Sparsifying a timezone-naive datetime series should work the same way as if it were timezone-aware or containing equivalent integer offsets. There is no need for a UserWarning about losing timezone information in this case.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.10.6.final.0 python-bits : 64 OS : Linux OS-release : 5.15.79.1-microsoft-standard-WSL2 Version : #1 SMP Wed Nov 23 01:01:46 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.24.1 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 59.6.0 pip : 22.0.2 Cython : 0.29.33 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 10.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : 2022.7

Comment From: jbrockmendel

This works for me on main. An RC should be coming out in a few hours. can you check with that?

Comment From: eerkela

For sure. It might only be broken in the PyPI release. All I did was a pip upgrade before testing.

Comment From: eerkela

Confirmed the issue is resolved on pandas build 2.1.0.dev0+5.g8d2a4e11d1

Looks like it just hasn't been pushed to PyPI yet. I'll close this issue and revisit later if it crops up again.

Here's the output from pd.show_versions() just to confirm:

INSTALLED VERSIONS ------------------ commit : 8d2a4e11d136af439de92ef1a970a1ae0edde4dc python : 3.10.6.final.0 python-bits : 64 OS : Linux OS-release : 5.15.79.1-microsoft-standard-WSL2 Version : #1 SMP Wed Nov 23 01:01:46 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.1.0.dev0+5.g8d2a4e11d1 numpy : 1.23.5 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 67.3.3 pip : 23.0.1 Cython : 0.29.32 pytest : 7.2.1 hypothesis : 6.68.2 sphinx : None blosc : 1.11.1 feather : None xlsxwriter : 3.0.8 lxml.etree : 4.9.2 html5lib : 1.1 pymysql : 1.0.2 psycopg2 : 2.9.5 jinja2 : 3.1.2 IPython : None pandas_datareader: None bs4 : None bottleneck : 1.3.6 brotli : None fastparquet : None fsspec : 2023.1.0 gcsfs : None matplotlib : None numba : 0.56.4 numexpr : 2.8.4 odfpy : None openpyxl : 3.1.0 pandas_gbq : None pyarrow : 10.0.1 pyreadstat : None pyxlsb : 1.0.10 s3fs : None scipy : 1.10.1 snappy : sqlalchemy : 2.0.4 tables : None tabulate : 0.9.0 xarray : None xlrd : 2.0.1 zstandard : 0.20.0 tzdata : 2022.7 qtpy : None pyqt5 : None