Pandas BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int

Problem description

DataFrame.to_dict() method do not cast Nullable Int types (Int*Dtype) into Python int type. Instead, it unwrapping into numpy.int* types.

Possibly related to: #27616, #25969, #21256

Expected Output

Native Python int type.

Reproduction

Make some data:

import pandas as pd

df = pd.DataFrame({'id': range(5),
                   'coeff': [i * 0.1 for i in range(5)],
                   'is_hot': [True] * 2 + [False] * 3,
                   'value': [1, None, 2, 3, None]})
df

	id	coeff	is_hot	value
0	0	0.0	True	1.0
1	1	0.1	True	NaN
2	2	0.2	False	2.0
3	3	0.3	False	3.0
4	4	0.4	False	NaN

df.dtypes

id          int64
coeff     float64
is_hot       bool
value     float64
dtype: object

value have to be a nullable int:

df['value'] = df['value'].astype(pd.Int64Dtype())
df.dtypes

id          int64
coeff     float64
is_hot       bool
value       Int64
dtype: object

Looks great. But convert a dataframe to dict:

dicts = df.to_dict(orient='records')
dicts

[{'id': 0, 'coeff': 0.0, 'is_hot': True, 'value': 1},
 {'id': 1, 'coeff': 0.1, 'is_hot': True, 'value': nan},
 {'id': 2, 'coeff': 0.2, 'is_hot': False, 'value': 2},
 {'id': 3, 'coeff': 0.30000000000000004, 'is_hot': False, 'value': 3},
 {'id': 4, 'coeff': 0.4, 'is_hot': False, 'value': nan}]

pd.DataFrame(
    [[type(v) for k, v in row.items()] for row in dicts], 
    columns=dicts[0].keys())

	id	coeff	is_hot	value
0	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'numpy.int64'>
1	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'float'>
2	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'numpy.int64'>
3	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'numpy.int64'>
4	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'float'>

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.7.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.4.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : en_US.UTF-8

pandas           : 1.0.4
numpy            : 1.18.4
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.15.0
pandas_datareader: None
bs4              : 4.8.1
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.2.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

Comment From: jorisvandenbossche

@dm-logv thanks for the report. This is probably related to https://github.com/pandas-dev/pandas/issues/29738

Comment From: arw2019

Minimal reproducer:

In [3]: import pandas as pd 
   ...:  
   ...: df = pd.DataFrame({'A': [1, None, 2, 3, None]}) 
   ...: df['A'] = df['A'].astype('Int64') 
   ...: dicts = df.to_dict(orient="records") 
   ...: pd.DataFrame( 
   ...:     [[type(v) for k, v in row.items()] for row in dicts],  
   ...:     columns=dicts[0].keys())                                                                                                                                                                               
Out[3]: 
                                       A
0                  <class 'numpy.int64'>
1  <class 'pandas._libs.missing.NAType'>
2                  <class 'numpy.int64'>
3                  <class 'numpy.int64'>
4  <class 'pandas._libs.missing.NAType'>

Comment From: arw2019

Even smaller:

In [2]: import pandas as pd 
   ...:  
   ...: df = pd.DataFrame({'A': [1, None]}) 
   ...: df['A'] = df['A'].astype('Int64') 
   ...: records_as_dicts = df.to_dict(orient="records") 
   ...: pd.DataFrame([[type(v) for v in row.values()] for row in records_as_dicts], columns=records_as_dicts[0].keys())                                                                                            
Out[2]: 
                                       A
0                  <class 'numpy.int64'>
1  <class 'pandas._libs.missing.NAType'>

Comment From: VikingPathak

I am using pd.read_sql() which returned a dataframe and then applying .to_dict() gave me a dictionary with value True having type numpy.bool_.

Could not find a better approach so I used to_json() instead of to_dict() and enclosed it in json.loads(). All the numpy types were converted to python object.

json.loads(dataframe.iloc[0, :].to_json())

Pandas BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int

Problem description

Expected Output

Reproduction

Output of pd.show_versions()

Output of `pd.show_versions()`