Pandas ENH: Support pa.json_ in arrow extension type

Pandas version checks

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

$ import pandas as pd
$ import pyarrow as pa
$ pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[2], line 4
      1 import pandas as pd
      2 import pyarrow as pa
----> 4 pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type

File ~/src/bigframes/venv/lib/python3.12/site-packages/pandas/core/dtypes/dtypes.py:2169, in ArrowDtype.type(self)
   2167 elif isinstance(pa_type, pa.ExtensionType):
   2168     return type(self)(pa_type.storage_type).type
-> 2169 raise NotImplementedError(pa_type)

NotImplementedError: extension<arrow.json>

Issue Description

Apache Arrow v19.0 introduced the pa.json_ extension type (doc). Currently, pandas.ArrowDtype.type does not correctly handle this new type.

The ArrowDtype.type method is crucial for various pandas dtype APIs, including pd.api.types.pandas_dtype() and pd.api.types.is_timedelta64_dtype(). When used with pd.ArrowDtype(pa.json_(pa.string())), these APIs produce unexpected results.

The issue is that the pandas ArrowDtype.type method should return the underlying storage type of the arrow json type.

Expected Behavior

pa.json_ is a standard Arrow extension type. ArrowDtype.type should accurately return its storage type, mirroring the behavior of other Arrow extension types. Specifically, pd.ArrowDtype(pa.json_(pa.string())).type should reflect the storage type, which is pa.string() as shown below.

Codes to show arrow storage type for pa.json_:

$ import pyarrow as pa
$ pa.json_(pa.string()).storage_type
DataType(string)

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.12.1 python-bits : 64 OS : Linux OS-release : 6.10.11-1rodete2-amd64 Version : #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1rodete2 (2024-10-16) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 2.2.2 pytz : 2025.1 dateutil : 2.9.0.post0 pip : 23.2.1 Cython : None sphinx : None IPython : 8.32.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2025.2.0 html5lib : None hypothesis : None gcsfs : 2025.2.0 jinja2 : None lxml.etree : None matplotlib : 3.10.0 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : 0.27.0 psycopg2 : None pymysql : None pyarrow : 19.0.0 pyreadstat : None pytest : 8.3.4 python-calamine : None pyxlsb : None s3fs : None scipy : 1.15.1 sqlalchemy : 2.0.38 tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.1 qtpy : None pyqt5 : None

Comment From: rhshadrach

Thanks for the report. As the error message says, this is not implemented yet. Reclassifying as an enhancement and not a bug.

Comment From: abhigyan631

I am trying to test the file test_arrow.py (for testing on a fix I implemented) inside the arrays folder but get this problem every time,

Is there a way to fix this please ? I know this is a VS code issue, so tested on Jupyter too, still gives the same problem. May be I am doing something wrong?

Thank you