Code to reproduce the error:
import pandas as pd
from datetime import datetime
dfs = {
'full_df': pd.DataFrame([
{'int': 1, 'date': datetime.now(), 'str': 'foo', 'float': 1.0, 'bool': True},
]),
'int_df': pd.DataFrame([
{'int': 1},
]),
'date_df': pd.DataFrame([
{'date': datetime.now()},
]),
'str_df': pd.DataFrame([
{'str': 'foo'},
]),
'float_df': pd.DataFrame([
{'float': 1.0},
]),
'bool_df': pd.DataFrame([
{'bool': True},
])
}
for name, frame in dfs.items():
print('Types in ' + name)
for k, v in frame.to_dict('records')[0].items():
print(type(v))
Output:
Types in full_df
<class 'bool'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'float'>
<class 'int'>
<class 'str'>
Types in int_df
<class 'numpy.int64'>
Types in date_df
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Types in str_df
<class 'str'>
Types in float_df
<class 'numpy.float64'>
Types in bool_df
<class 'numpy.bool_'>
Problem description
One would expect that the to_dict()
function returns python native types, or at least does the same to the same type of columns, however it behaves differently as shown above. It seems that type conversion is not invoked when a single column is present in the dataframe.
Expected Output
Python native types where it is possible for int
, float
, bool
and str
types, and if possible, a python datetime
object instead of pandas.Timestamp
Output of pd.show_versions()
Comment From: hodossy
I have a temporary solution until is is fixed:
class NativeDict(dict):
"""
Helper class to ensure that only native types are in the dicts produced by
:func:`to_dict() <pandas.DataFrame.to_dict>`
.. note::
Needed until `#21256 <https://github.com/pandas-dev/pandas/issues/21256>`_ is resolved.
"""
def __init__(self, *args, **kwargs):
super().__init__(((k, self.convert_if_needed(v)) for row in args for k, v in row), **kwargs)
@staticmethod
def convert_if_needed(value):
"""
Converts `value` to native python type.
.. warning::
Only :class:`Timestamp <pandas.Timestamp>` and numpy :class:`dtypes <numpy.dtype>` are converted.
"""
if pd.isnull(value):
return None
if isinstance(value, pd.Timestamp):
return value.to_pydatetime()
if hasattr(value, 'dtype'):
mapper = {'i': int, 'u': int, 'f': float}
_type = mapper.get(value.dtype.kind, lambda x: x)
return _type(value)
return value
This also replaces NaN
and NaT
objects with native python None
. Please note that it only intended use is to convert into, I have not tested elsewhere. It can be used like so:
df.to_dict(orient='records', into=NativeDict)
Comment From: arw2019
This is fixed on 1.2 master. Running the OP:
In [3]: import pandas as pd
...: from datetime import datetime
...:
...: dfs = {
...: 'full_df': pd.DataFrame([
...: {'int': 1, 'date': datetime.now(), 'str': 'foo', 'float': 1.0, 'bool': True},
...: ]),
...: 'int_df': pd.DataFrame([
...: {'int': 1},
...: ]),
...: 'date_df': pd.DataFrame([
...: {'date': datetime.now()},
...: ]),
...: 'str_df': pd.DataFrame([
...: {'str': 'foo'},
...: ]),
...: 'float_df': pd.DataFrame([
...: {'float': 1.0},
...: ]),
...: 'bool_df': pd.DataFrame([
...: {'bool': True},
...: ])
...: }
...:
...: for name, frame in dfs.items():
...: print('Types in ' + name)
...: for k, v in frame.to_dict('records')[0].items():
...: print(type(v))
...:
Types in full_df
<class 'int'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'str'>
<class 'float'>
<class 'bool'>
Types in int_df
<class 'int'>
Types in date_df
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Types in str_df
<class 'str'>
Types in float_df
<class 'float'>
Types in bool_df
<class 'bool'>
Comment From: hodossy
Hello! Thanks for fixing the integers, but it seems that date types are still using the internal type. Would it be possible to convert them to native type as well?
Comment From: arw2019
Do we want to reopen this?
xref https://github.com/pandas-dev/pandas/pull/37648#discussion_r571652150 I think we're not gonna act here but it does keep coming up