Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pyarrow as pa
import pandas as pd
from datetime import date, timedelta

# Create a Date32Array (date32[day]) in PyArrow
date_array = pa.array([date.today() - timedelta(days=i) for i in range(5)], type=pa.date32())

# Create a PyArrow Table
table = pa.Table.from_arrays([date_array], schema=pa.schema([('date', pa.date32())]))

# Convert PyArrow Table to pandas DataFrame using types_mapper
df = table.to_pandas(types_mapper={pa.date32(): pd.ArrowDtype(pa.date64())}.get)

# Display the pandas DataFrame
>>> df.dtypes
date    date32[day][pyarrow]
dtype: object

## ERROR: we expect an `date64[ms][pyarrow]` dtype here.

Issue Description

Here is the call stack of above example:

  • init (pandas/core/arrays/arrow/array.py:208)
    def __init__(self, values: pa.Array | pa.ChunkedArray, *, dtype: ArrowDtype | None = None) -> None:
        ...
        elif isinstance(values, pa.ChunkedArray):
            self._data = values
        ...
        self._dtype = ArrowDtype(self._data.type)

Here, self._data.type is DataType(date32[day]) so mapped to ArrowDtype as date32[day][pyarrow]. Although we expect it map to date64[ms][pyarrow]

  • from_arrow (pandas/core/arrays/arrow/dtype.py:204)
    def __from_arrow__(self, array: pa.Array | pa.ChunkedArray):
        """
        Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
        """
        array_class = self.construct_array_type()
        return array_class(array)

Here, the self is a date64[ms][pyarrow] ArrowDtype and array is pyarrow DataType(date32[day])

  • _reconstruct_block (pyarrow/pandas_compat.py:780)
  • (pyarrow/pandas_compat.py:1171)
  • _table_to_blocks (pyarrow/pandas_compat.py:1171)
  • table_to_blockmanager (pyarrow/pandas_compat.py:820)
  • table.to_pandas(types_mapper={pa.date32(): pd.ArrowDtype(pa.date64())}.get)

Expected Behavior

Instead of getting ArrowDtype from data, can ArrowDtype.__from_arrow__ pass itself to ArrowExtensionArray.__init__. In this way, the data can convert to be an expected pandas dtype.

Installed Versions

INSTALLED VERSIONS ------------------ commit : fe415f553ff28c9e1a4ffca49f89030152e66a73 python : 3.9.16.final.0 python-bits : 64 OS : Linux OS-release : 5.19.11-1rodete1-amd64 Version : #1 SMP PREEMPT_DYNAMIC Debian 5.19.11-1rodete1 (2022-10-31) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.1.0.dev0+444.gfe415f553f numpy : 1.24.2 pytz : 2023.3 dateutil : 2.8.2 setuptools : 67.6.0 pip : 23.0.1 Cython : None pytest : 7.2.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.12.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None