I'm recoding multiple columns in a dataframe and have come across a strange result that I can't quite figure out. I'm probably not recoding in the most efficient manner possible, but it's mostly the error that I'm hoping someone can explain.

s1 = pd.DataFrame([np.nan, '1', '2', '3', '4', '5', columns='col1')
s2 = pd.DataFrame([np.nan, 1, 2, 3, 4, 5], columns='col1')
s1_dic = {np.nan: np.nan, '1': 1, '2':2, '3':3, '4':3, '5':3}
s2_dic = {np.nan: np.nan, 1: 1, 2:2, 3:3, 4:3, 5:3}
s1['col1'].apply(lambda x: s1_dic[x])
s2['col1'].apply(lambda x: s2_dic[x])

s1 works fine, but when I try to do the same thing with a list of integers and a np.nan, I get KeyError: nan which is confusing. Any help would be appreciated.

edit @hayd: http://stackoverflow.com/q/33286748/1240268

Comment From: max-sixty

This may be to do with NaN not being equal to itself. But it works with s1, and you can still use it as a python dictionary key (I'm not sure how that works actually?), so it must be that the path taken depends on the dtype.

In [164]: pd.np.nan==pd.np.nan
Out[164]: False

In [170]: s1_dic[pd.np.nan]
Out[170]: nan

In [171]: s2_dic[pd.np.nan]
Out[171]: nan

This is a separate point from the above, but there may be a much easier way of doing what you're trying to do. What's the wider goal?

FYI this is a corrected version of your code:

s1 = pd.DataFrame([np.nan, '1', '2', '3', '4', '5'], columns=['col1'])
s2 = pd.DataFrame([np.nan, 1, 2, 3, 4, 5], columns=['col1'])
s1_dic = {np.nan: np.nan, '1': 1, '2':2, '3':3, '4':3, '5':3}
s2_dic = {np.nan: np.nan, 1: 1, 2:2, 3:3, 4:3, 5:3}
s1['col1'].apply(lambda x: s1_dic[x])
s2['col1'].apply(lambda x: s2_dic[x])

And not sure what the principles that the maintainers use, but IMHO this sort of question may be better on SO.

Comment From: kawochen

I would caution against what you are doing, since even if that worked, something like this could happen.

In [1]: {1:1, 2:2, 3:3}[1.0000000000000001]
Out[1]: 1

@maximilianr in a simple dict, it is in general OK that nan doesn't equal itself, because the is operator has precedence.

Comment From: max-sixty

@kawochen Ah great, thanks

Comment From: hayd

I suggested @brianhuey post here. I think this is a bug/edge case in map_infer actually, that seems to be where the error comes from. It works with object dtype:

In [11]: d1 = {1: 1, np.nan: np.nan}

In [12]: pd.lib.map_infer(np.array([np.nan, 1], dtype='object'), lambda x: d1[x], True)
Out[12]: array([ nan,   1.])

In [13]: pd.lib.map_infer(np.array([np.nan, 1]), lambda x: d1[x], True)---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-13-d8d2e1f5f73a> in <module>()
----> 1 pd.lib.map_infer(np.array([np.nan, 1]), lambda x: d1[x], True)

pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:58435)()

<ipython-input-154-d8d2e1f5f73a> in <lambda>(x)
----> 1 pd.lib.map_infer(np.array([np.nan, 1]), lambda x: d1[x], True)

KeyError: nan

Comment From: kawochen

@hayd then you have the answer right there already! np.float64(np.nan) is not a singleton, doesn't equal np.float64(np.nan), is not np.nan and doesn't equal np.nan; in an object array we have the np.nan that is of type float in it

Comment From: ron819

is this still an issue?

Comment From: rhshadrach

This is an issue with using np.nan as a key in a dictionary, which is unreliable at best because it does not equal to itself, and you often get a view of np.nan so an is check returns False as well. Closing.