Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pyarrow as pa
import cudf
import pandas as pd

In [19]: data = [{"text": "hello", "list_col": np.asarray([1, 2], dtype="uint32")}]

In [20]: df = cudf.DataFrame(data)

In [21]: arrow_types_pdf = df.to_pandas(arrow_type=True)

In [22]: arrow_types_pdf
Out[22]: 
    text list_col
0  hello    [1 2]

In [23]: arrow_types_pdf.dtypes
Out[23]: 
text                    string[pyarrow]
list_col    list<item: uint32>[pyarrow]
dtype: object

In [24]: pa_table = pa.Table.from_pandas(arrow_types_pdf)

In [25]: pa_table
Out[25]: 
pyarrow.Table
text: string
list_col: list<item: uint32>
  child 0, item: uint32
----
text: [["hello"]]
list_col: [[[1,2]]]

In [26]: pa_table.to_pandas()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 pa_table.to_pandas()

File /datasets/pgali/envs/cudfdev/lib/python3.12/site-packages/pyarrow/array.pxi:887, in pyarrow.lib._PandasConvertible.to_pandas()

File /datasets/pgali/envs/cudfdev/lib/python3.12/site-packages/pyarrow/table.pxi:5132, in pyarrow.lib.Table._to_pandas()

File /datasets/pgali/envs/cudfdev/lib/python3.12/site-packages/pyarrow/pandas_compat.py:783, in table_to_dataframe(options, table, categories, ignore_metadata, types_mapper)
    780     table = _add_any_metadata(table, pandas_metadata)
    781     table, index = _reconstruct_index(table, index_descriptors,
    782                                       all_columns, types_mapper)
--> 783     ext_columns_dtypes = _get_extension_dtypes(
    784         table, all_columns, types_mapper)
    785 else:
    786     index = _pandas_api.pd.RangeIndex(table.num_rows)

File /datasets/pgali/envs/cudfdev/lib/python3.12/site-packages/pyarrow/pandas_compat.py:862, in _get_extension_dtypes(table, columns_metadata, types_mapper)
    857 dtype = col_meta['numpy_type']
    859 if dtype not in _pandas_supported_numpy_types:
    860     # pandas_dtype is expensive, so avoid doing this for types
    861     # that are certainly numpy dtypes
--> 862     pandas_dtype = _pandas_api.pandas_dtype(dtype)
    863     if isinstance(pandas_dtype, _pandas_api.extension_dtype):
    864         if hasattr(pandas_dtype, "__from_arrow__"):

File /datasets/pgali/envs/cudfdev/lib/python3.12/site-packages/pyarrow/pandas-shim.pxi:148, in pyarrow.lib._PandasAPIShim.pandas_dtype()

File /datasets/pgali/envs/cudfdev/lib/python3.12/site-packages/pyarrow/pandas-shim.pxi:151, in pyarrow.lib._PandasAPIShim.pandas_dtype()

File /datasets/pgali/envs/cudfdev/lib/python3.12/site-packages/pandas/core/dtypes/common.py:1645, in pandas_dtype(dtype)
   1640     with warnings.catch_warnings():
   1641         # GH#51523 - Series.astype(np.integer) doesn't show
   1642         # numpy deprecation warning of np.integer
   1643         # Hence enabling DeprecationWarning
   1644         warnings.simplefilter("always", DeprecationWarning)
-> 1645         npdtype = np.dtype(dtype)
   1646 except SyntaxError as err:
   1647     # np.dtype uses `eval` which can raise SyntaxError
   1648     raise TypeError(f"data type '{dtype}' not understood") from err

TypeError: data type 'list<item: uint32>[pyarrow]' not understood

In [27]: pa_table.to_pandas(ignore_metadata=True)
Out[27]: 
    text list_col
0  hello   [1, 2]

Issue Description

It looks like pa.Table.to_pandas is not yet compatible with arrow extension types that pandas supports and the conversion back fails in pandas_dtype API.

Expected Behavior

Round-trip the dataframe without errors.

Installed Versions

In [28]: pd.show_versions() INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.12.8 python-bits : 64 OS : Linux OS-release : 5.4.0-182-generic Version : #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 2.0.2 pytz : 2024.1 dateutil : 2.9.0.post0 pip : 25.0.1 Cython : 3.0.12 sphinx : 8.1.3 IPython : 8.32.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.13.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2025.2.0 html5lib : None hypothesis : 6.125.1 gcsfs : None jinja2 : 3.1.5 lxml.etree : None matplotlib : None numba : 0.60.0 numexpr : None odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 18.1.0 pyreadstat : None pytest : 7.4.4 python-calamine : None pyxlsb : None s3fs : 2025.2.0 scipy : 1.15.1 sqlalchemy : 2.0.38 tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : 0.23.0 tzdata : 2025.1 qtpy : None pyqt5 : None

Comment From: mroeschke

The canonical way to this, if you know the original pandas DataFrame all started with ArrowDtypes, is to specify .to_pandas(types_mapper=pd.ArrowDtype)

And it appears as of pyarrow 19 (https://github.com/apache/arrow/pull/44720), this now supports roundtripping nested arrow types

In [4]: df = pd.DataFrame({"a": pd.arrays.ArrowExtensionArray(pa.array([[1]]))})

In [5]: pa.Table.from_pandas(df).to_pandas(types_mapper=pd.ArrowDtype)
Out[5]: 
     a
0  [1]

In [6]: pa.__version__
Out[6]: '19.0.0'