Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [1]: df = pd.DataFrame({'a': [1, 2, 3], 'c': ['a', 'b', 'c']}).astype(dtype={'a': 'int64', 'c': 'string'}).set_index('a')
[PYFLYBY] import pandas as pd

In [2]: with pd.option_context("mode.dtype_backend", "pyarrow"):
   ...:     df2 = df.convert_dtypes(infer_objects=False)
   ...:

In [3]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 1 to 3
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   c       3 non-null      string
dtypes: string(1)
memory usage: 48.0 bytes

In [4]: df2.info()
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 1 to 3
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   c       3 non-null      string[pyarrow]
dtypes: string[pyarrow](1)
memory usage: 39.0 bytes

In [5]: df.index.dtype
Out[5]: dtype('int64')

In [6]: df2.index.dtype
Out[6]: dtype('int64')

In [7]: df2.index._data
Out[7]: array([1, 2, 3])

In [8]: df2.index._data_cls
Out[8]: (numpy.ndarray, pandas.core.arrays.base.ExtensionArray)

Issue Description

Calling convert_dtypes with mode.dtype_backend='pyarrow' doesn't convert the storage of the index.

Expected Behavior

I think I expect to see that the index is also stored in the extension array type:

In [1]: df = pd.DataFrame({'a': [1, 2, 3], 'c': ['a', 'b', 'c']}).astype(dtype={'a': 'int64', 'c': 'string'}).set_index('a')
[PYFLYBY] import pandas as pd

In [2]: with pd.option_context("mode.dtype_backend", "pyarrow"):
   ...:     df3 = df.reset_index().convert_dtypes(infer_objects=False).set_index('a')
   ...:

In [3]: df3.index
Out[3]: Index([1, 2, 3], dtype='int64[pyarrow]', name='a')

In [4]: df3.index._data_cls
Out[4]: (numpy.ndarray, pandas.core.arrays.base.ExtensionArray)

I've made a few bug reports today about expected behavior with mode.dtype_backend='pyarrow' since we're hoping to switch to that flag by default when Pandas 2.0 ships, but let me know if these are unwelcome or a distraction.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 6bb8f73e75bb5d25ca752f34f54705cdc533da54 python : 3.10.4.final.0 python-bits : 64 OS : Linux OS-release : 5.15.90.1-microsoft-standard-WSL2 Version : #1 SMP Fri Jan 27 02:56:13 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.1.0.dev0+93.g6bb8f73e75 numpy : 1.23.3 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.3 Cython : 0.29.32 pytest : 7.1.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.5.0 pandas_datareader: 0.10.0 bs4 : 4.11.1 bottleneck : 1.3.6 brotli : None fastparquet : 0.8.3 fsspec : 2022.8.2 gcsfs : None matplotlib : 3.5.3 numba : 0.56.4 numexpr : 2.8.4 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 9.0.0 pyreadstat : None pyxlsb : None s3fs : 2022.8.2 scipy : 1.9.1 snappy : None sqlalchemy : 1.4.41 tables : 3.8.0 tabulate : None xarray : 2023.2.0 xlrd : 2.0.1 zstandard : 0.18.0 tzdata : None qtpy : 2.3.0 pyqt5 : None

Comment From: phofl

Hi, your reports are very welcome. This helps us a lot!

The current behavior is consistent with the pandas backend, this gives us "int64" as well instead of "Int64". We could add another option to convert the index as well maybe?

cc @mroeschke

Comment From: mroeschke

Yeah the docs state that

Convert columns to best possible dtypes using dtypes supporting pd.NA.

Maybe Index should have a convert_dtypes method instead. We don't really have methods that operate on the values and the index at once.

Comment From: Rylie-W

Take

Comment From: phofl

Please wait until we decided on what to do here