On 1.2 master:
In [15]: import pandas as pd
...: import pyarrow as pa
...:
...: df = pd.DataFrame({'a': pa.scalar(7)}, index=['a'])
...: print(type(df.to_dict(orient="records")[0]['a']))
...:
<class 'pyarrow.lib.Int64Scalar'>
This is inconsistent with the corresponding op for NumPy backed types (xref #37571)
In [13]: import pandas as pd
...: import numpy as np
...:
...: df = pd.DataFrame({'a': np.int(7)}, index=['a'])
...: print(type(df.to_dict(orient="records")[0]['a']))
...:
<class 'int'>
In general we'd like to_dict
to return Python native types if possible.
Comment From: jorisvandenbossche
In this specific case, the column has object
dtype (storing the actual pyarrow scalar, we didn't coerce to an integer dtype). I think that for object dtype, we should simply return the element as stored in the array (in general we also don't have a way to know the "native" type for a generic python object that can be stored in object dtype arrays)
Comment From: arw2019
In this specific case, the column has
object
dtype (storing the actual pyarrow scalar, we didn't coerce to an integer dtype). I think that for object dtype, we should simply return the element as stored in the array (in general we also don't have a way to know the "native" type for a generic python object that can be stored in object dtype arrays)
That makes sense. Would we revisit this for pyarrow-backed arrays?
Comment From: jorisvandenbossche
Those won't use object
dtype (like eg the ArrowStringArray in the works), so what I said above about object dtype doesn't apply. And for most arrow types I think there is a clear "native" python type.