I would expect this syntax to work without problems
import pandas as pd
d = [{'a':'1','b':'2'},{'a':'3','b':'4'}]
pd.DataFrame.from_dict(d, orient='columns', dtype={'a':int,'b':int})
Expected Output
Expected DataFrame:
a b
0 1 2
1 3 4
with dtypes:
a int64
b int64
dtype: object
Instead, the output is
/usr/lib/python2.7/dist-packages/numpy/core/_internal.pyc in _makenames_list(adict, align)
24 for fname in fnames:
25 obj = adict[fname]
---> 26 n = len(obj)
27 if not isinstance(obj, tuple) or n not in [2, 3]:
28 raise ValueError("entry not a 2- or 3- tuple")
TypeError: object of type 'type' has no len()
Output of pd.show_versions()
Comment From: jorisvandenbossche
From the docstring of from_dict
:
dtype : dtype, default None
Data type to force, otherwise infer
So I think the dtype
argument here only supports single dtypes, not dicts of dtypes (as some other pandas functions do). In the example case this is even simpler:
In [16]: pd.DataFrame.from_dict(d, orient='columns', dtype=int)
Out[16]:
a b
0 1 2
1 3 4
In [17]: pd.DataFrame.from_dict(d, orient='columns', dtype=int).dtypes
Out[17]:
a int64
b int64
dtype: object
but for the general case this would be a enhancement for from_dict
to accept dicts.
Comment From: JoaoAparicio
Alright, so imagine that I have one column int and one column float. Problem still stands, no?
So I think the dtype argument here only supports single dtypes, not dicts of dtypes (as some other pandas functions do).
Should this be improved?
Comment From: jorisvandenbossche
Yes, a PR to improve this would be welcomed.
Comment From: JoaoAparicio
Like this?
my_dtypes = { ( ... ) }
for k,v in my_dtypes.iteritems():
if k in df.columns:
df[k] = df[k].apply(lambda x: v(x))
:D
Comment From: jreback
duplicate issue: https://github.com/pandas-dev/pandas/issues/4464
Comment From: jorisvandenbossche
@JoaoAparicio basically, yes, but the dataframe constructor code is rather complex (many options / code paths) so you would have to see where this fits. There are possibly also ways do to it more efficiently during dataframe creation instead of astype afterwards.
See also how pd.DataFrame().astype(dict)
implements this. Which is, BTW, something you can also use at the moment:
pd.DataFrame.from_dict(d, orient='columns').astype({'a':int,'b':int})
this works fine with different dtypes (and this is also more explicit that it happens after the dataframe creation).
Comment From: avnishbm
@JoaoAparicio basically, yes, but the dataframe constructor code is rather complex (many options / code paths) so you would have to see where this fits. There are possibly also ways do to it more efficiently during dataframe creation instead of astype afterwards. See also how
pd.DataFrame().astype(dict)
implements this. Which is, BTW, something you can also use at the moment:
pd.DataFrame.from_dict(d, orient='columns').astype({'a':int,'b':int})
this works fine with different dtypes (and this is also more explicit that it happens after the dataframe creation).
Though astype() fails if the column has a missing value (or np.nan), hence trying to convert from float to int would fail. Other problem is that if the dict has a row with missing column value, it assumes it to be np.nan and then convert the entire column as float, where as rest of the elements in the column were int. Getting it back to int type using astype() also fails as mentioned (for np.nan value) i.e. the column remains of float type.