Pandas BUG: DataFrame.to_dict(orient="records") does not return native types with frame constructed from pyarrow Scalars

On 1.2 master:

In [15]: import pandas as pd
    ...: import pyarrow as pa
    ...: 
    ...: df = pd.DataFrame({'a': pa.scalar(7)}, index=['a'])
    ...: print(type(df.to_dict(orient="records")[0]['a']))
    ...: 
<class 'pyarrow.lib.Int64Scalar'>

This is inconsistent with the corresponding op for NumPy backed types (xref #37571)

In [13]: import pandas as pd
    ...: import numpy as np
    ...: 
    ...: df = pd.DataFrame({'a': np.int(7)}, index=['a'])
    ...: print(type(df.to_dict(orient="records")[0]['a']))
    ...: 
<class 'int'>

In general we'd like to_dict to return Python native types if possible.

Comment From: jorisvandenbossche

In this specific case, the column has object dtype (storing the actual pyarrow scalar, we didn't coerce to an integer dtype). I think that for object dtype, we should simply return the element as stored in the array (in general we also don't have a way to know the "native" type for a generic python object that can be stored in object dtype arrays)

Comment From: arw2019

In this specific case, the column has object dtype (storing the actual pyarrow scalar, we didn't coerce to an integer dtype). I think that for object dtype, we should simply return the element as stored in the array (in general we also don't have a way to know the "native" type for a generic python object that can be stored in object dtype arrays)

That makes sense. Would we revisit this for pyarrow-backed arrays?

Comment From: jorisvandenbossche

Those won't use object dtype (like eg the ArrowStringArray in the works), so what I said above about object dtype doesn't apply. And for most arrow types I think there is a clear "native" python type.