Code Sample, a copy-pastable example if possible

This SO questions asks the simple question of how to recode strings in a data frame as numerical categories http://stackoverflow.com/questions/39475187/how-to-speed-up-recoding-into-integers .

The pandas solution x = df.apply(lambda x: x.astype('category').cat.codes) Is by far the fastest. However it doesn't give a consistent answer if the data frame has more than one column.

E.g.

g,k a,h c,i j,e d,i i,h b,b d,d i,a d,h

gets recoded to:

0 1 0 4 6 1 0 4 2 2 5 3 6 3 4 3 5 5 5 4 6 1 1 7 3 2 8 5 0 9 3 4

Notice that 'd' is mapped to 3 in the first column but 2 in the second.

It would be great if pandas could do this recoding consistently.

Expected Output

output of pd.show_versions()

Comment From: jorisvandenbossche

@lesshaste The fact that df.apply(lambda x: x.astype('category').cat.codes) does this column by column is expected. But see https://github.com/pydata/pandas/issues/12860 for some discussion on how to be able to do this on multiple columns at once (using the same categories for all columns).

The workaround listed over there is:

uniques = np.sort(pd.unique(df.values.ravel()))
df.apply(lambda x: x.astype('category', categories=uniques))

Comment From: lesshaste

That is very nice and I had no idea you could do that. Thank you.