Pandas Strange performance of Series.replace vs Dataframe.replace

Code Sample, a copy-pastable example if possible

di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" }
df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })

%timeit df.replace({"col1": di})          # DSM answer #1 (df-style)
10 loops, best of 3: 57.1 ms per loop

%timeit df.col1.replace(di)               # DSM answer #2 (series-style)
10 loops, best of 3: 57.1 ms per loop

%timeit df.col1.replace({"col1": di})     # hybrid of DSM #1 & #2
The slowest run took 98.89 times longer than the fastest. This could mean that 
an intermediate result is being cached.
10000 loops, best of 3: 93.5 µs per loop

# Note:  sometimes I got the "slowest run" message and sometimes I didn't. 
# When I did, it generally ranged from 5x to 100x.  But even adjusting by 100x, 
# this still implies that the 3rd (hybrid syntax) is over 50x faster than the 1st or 2nd.

Stackoverflow version of this (scroll to the last answer): https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict

Problem description

The 3rd syntax variation is way faster, but seemingly redundant b/c it essentially specifies the column twice (also no one would think to do it this way). It seems that the 1st and 2nd syntax should be just as fast.

Output of `pd.show_versions()`

The output of show_versions is for my work computer with 0.19.2 (windows) but I've also tested on my mac with pandas 0.20.1 and timings are very similar.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: None

Comment From: chris-b1

Your third way isn't equivalent - the string "col1" will be the only thing searched for, so nothing is replaced.

In [32]: df.col1.replace({"col1": di}).head()
Out[32]: 
0    7
1    4
2    8

Comment From: johne13

Yeah, sorry, dumb mistake by me. Obviously that explains the speed.

Can I delete this? Or can you or someone else? (or else just mark as closed I guess)

Comment From: TomAugspurger

Closing is fine.

Pandas Strange performance of Series.replace vs Dataframe.replace

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

Output of `pd.show_versions()`