Code Sample, a copy-pastable example if possible
di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" }
df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })
%timeit df.replace({"col1": di}) # DSM answer #1 (df-style)
10 loops, best of 3: 57.1 ms per loop
%timeit df.col1.replace(di) # DSM answer #2 (series-style)
10 loops, best of 3: 57.1 ms per loop
%timeit df.col1.replace({"col1": di}) # hybrid of DSM #1 & #2
The slowest run took 98.89 times longer than the fastest. This could mean that
an intermediate result is being cached.
10000 loops, best of 3: 93.5 µs per loop
# Note: sometimes I got the "slowest run" message and sometimes I didn't.
# When I did, it generally ranged from 5x to 100x. But even adjusting by 100x,
# this still implies that the 3rd (hybrid syntax) is over 50x faster than the 1st or 2nd.
Stackoverflow version of this (scroll to the last answer): https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict
Problem description
The 3rd syntax variation is way faster, but seemingly redundant b/c it essentially specifies the column twice (also no one would think to do it this way). It seems that the 1st and 2nd syntax should be just as fast.
Output of pd.show_versions()
The output of show_versions is for my work computer with 0.19.2 (windows) but I've also tested on my mac with pandas 0.20.1 and timings are very similar.
Comment From: chris-b1
Your third way isn't equivalent - the string "col1"
will be the only thing searched for, so nothing is replaced.
In [32]: df.col1.replace({"col1": di}).head()
Out[32]:
0 7
1 4
2 8
Comment From: johne13
Yeah, sorry, dumb mistake by me. Obviously that explains the speed.
Can I delete this? Or can you or someone else? (or else just mark as closed I guess)
Comment From: TomAugspurger
Closing is fine.