Code Sample
I've come across a peculiar case, it has two conditions: 1. The dataframe contains NaT values (I've tried NoneType and that seems to work just fine) 2. The applied function returns a dict
sample = pd.DataFrame({'date': [pd.NaT, pd.NaT, pd.NaT, pd.NaT], 'period': [1,1,1,1], 'parent_id': ['a', 'b', 'c', 'd']})
sample.apply(lambda x: {'parent_user_id': x.parent_id}, axis=1, reduce=True)
Problem description
This is flawed since the output should be a series where each element is a dictionary, instead this outputs a dataframe of NaNs.
Expected Output
Out[40]:
0 {'parent_user_id': 'a'}
1 {'parent_user_id': 'b'}
2 {'parent_user_id': 'c'}
3 {'parent_user_id': 'd'}
Output
Out[46]:
date parent_id period
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
# Paste the output here pd.show_versions() here
Comment From: TomAugspurger
Nothing to do with NaT
, as this same thing happens after you fill the values
In [15]: sample.fillna(dict(date=pd.Timestamp('2017'))).apply(lambda x: {'parent_user_id': x.parent_id}, axis=1, reduce=True)
...:
...:
Out[15]:
date parent_id period
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
NaT and None might have behaved differently, if using None
forced an object
dtype.
This is more about the output shape inference that apply
does. You'll be much better off avoiding .apply(..., axis=1)
and just doing things directly:
In [20]: pd.Series([{'parent_user_id': x.parent_id} for x in sample.itertuples()])
Out[20]:
0 {'parent_user_id': 'a'}
1 {'parent_user_id': 'b'}
2 {'parent_user_id': 'c'}
3 {'parent_user_id': 'd'}
dtype: object
Comment From: TomAugspurger
This falls under https://github.com/pandas-dev/pandas/issues/15628