Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
$ import pandas as pd
$ import pyarrow as pa
$ pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[2], line 4
1 import pandas as pd
2 import pyarrow as pa
----> 4 pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type
File ~/src/bigframes/venv/lib/python3.12/site-packages/pandas/core/dtypes/dtypes.py:2169, in ArrowDtype.type(self)
2167 elif isinstance(pa_type, pa.ExtensionType):
2168 return type(self)(pa_type.storage_type).type
-> 2169 raise NotImplementedError(pa_type)
NotImplementedError: extension<arrow.json>
Issue Description
Apache Arrow v19.0 introduced the pa.json_
extension type (doc). Currently, pandas.ArrowDtype.type does not correctly handle this new type.
The ArrowDtype.type
method is crucial for various pandas dtype APIs, including pd.api.types.pandas_dtype()
and pd.api.types.is_timedelta64_dtype()
. When used with pd.ArrowDtype(pa.json_(pa.string()))
, these APIs produce unexpected results.
The issue is that the pandas ArrowDtype.type
method should return the underlying storage type of the arrow json type.
Expected Behavior
pa.json_
is a standard Arrow extension type. ArrowDtype.type
should accurately return its storage type, mirroring the behavior of other Arrow extension types.
Specifically, pd.ArrowDtype(pa.json_(pa.string())).type
should reflect the storage type, which is pa.string()
as shown below.
Codes to show arrow storage type for pa.json_
:
$ import pyarrow as pa
$ pa.json_(pa.string()).storage_type
DataType(string)
Installed Versions
Comment From: rhshadrach
Thanks for the report. As the error message says, this is not implemented yet. Reclassifying as an enhancement and not a bug.
Comment From: abhigyan631
I am trying to test the file test_arrow.py (for testing on a fix I implemented) inside the arrays folder but get this problem every time,
Is there a way to fix this please ? I know this is a VS code issue, so tested on Jupyter too, still gives the same problem. May be I am doing something wrong?
Thank you