Pandas Refactor DataFrame.replace to dispatch on types

I'm having trouble figuring out how DataFrame.replace() is supposed to work. I'm not sure if this is a bug or a documentation issue.

In [1]: import pandas

In [2]: df = pandas.DataFrame({"col1":range(5), "col2":[0.5]*3+[1.0]*2})

In [3]: df
Out[3]: 
   col1  col2
0     0   0.5
1     1   0.5
2     2   0.5
3     3   1.0
4     4   1.0

In [4]: df.replace(1.0, "a")
Out[4]: 
  col1 col2
0    0  0.5
1    a  0.5
2    2  0.5
3    3    a
4    4    a

In [5]: df.replace(1.0, "a").replace(0.5, "b")
Out[5]: 
  col1 col2
0    0    b
1    a    b
2    2    b
3    3    a
4    4    a

So far, so good, everything makes sense. But I would have expected this to accomplish the same as above:


In [6]: df.replace({1.0:"a", 0.5:"b"})
Out[6]: 
  col1 col2
0    b    b
1    a    a
2    2    b
3    3    a
4    4    b

As you can see, I'm getting alternating "b" and "a". From a quick browse of the source code, it seems that the dictionary-replacement option should result in the same outcome as the following (which gives what I would have expected):

In [15]: df.replace([1.0, 0.5], ["a", "b"])
Out[15]: 
  col1 col2
a    0    b
b    a    b
c    2    b
d    3    a
e    4    a

I'm not sure what the to_replace=dict option is supposed to be doing but (at least for pandas v 0.12.0) it isn't doing what I would have expected.

Whether this is a bug or not, the df.replace() method needs better documentation. It's not enough to include a disclaimer that "This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works."

Comment From: cpcloud

What version of pandas are you using? I believe this is fixed in master.

On Monday, November 18, 2013, Noah wrote:

I'm having trouble figuring out how DataFrame.replace() is supposed to work. I'm not sure if this is a bug or a documentation issue.

In [1]: import pandas

In [2]: df = pandas.DataFrame({"col1":range(5), "col2":[0.5]_3+[1.0]_2})

In [3]: df Out[3]: col1 col2 0 0 0.5 1 1 0.5 2 2 0.5 3 3 1.0 4 4 1.0

In [4]: df.replace(1.0, "a") Out[4]: col1 col2 0 0 0.5 1 a 0.5 2 2 0.5 3 3 a 4 4 a

In [5]: df.replace(1.0, "a").replace(0.5, "b") Out[5]: col1 col2 0 0 b 1 a b 2 2 b 3 3 a 4 4 a

So far, so good, everything makes sense. But I would have expected this to accomplish the same as above:

In [6]: df.replace({1.0:"a", 0.5:"b"}) Out[6]: col1 col2 0 b b 1 a a 2 2 b 3 3 a 4 4 b

As you can see, I'm getting alternating "b" and "a". From a quick browse of the source code, it seems that the dictionary-replacement option should result in the same outcome as the following (which gives what I would have expected):

In [15]: df.replace([1.0, 0.5], ["a", "b"]) Out[15]: col1 col2 a 0 b b a b c 2 b d 3 a e 4 a

I'm not sure what the to_replace=dict option is supposed to be doing but (at least for pandas v 0.12.0) it isn't doing what I would have expected.

Whether this is a bug or not, the df.replace() method needs better documentation. It's not enough to include a disclaimer that "This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works."

— Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/5541 .

Best, Phillip Cloud

Comment From: nspies

You are correct. Let me amend the above then to regard simply the documentation of what is clearly a really complicated replace interface. There should at least be examples of the different types of syntax (I couldn't find any anywhere in the docs).

Comment From: jankatins