The parameter for categories is documented as being index-like.

Does that include dictionaries?

If so, some unexpected results occurred when working with integers and dictionaries in version 0.17.1

The following sets out what I found

sex = [1,2,0,1]
categories = {1:'Male', 2:'Female', 0:'Unknown'}

pd.Categorical.from_codes(sex, categories=categories)

returns the the array with keys instead of the values (though i suspect that was luck given outcomes below)

[1, 2, 0, 1]
Categories (3, int64): [0, 1, 2]

Swapping the dictionary key/values around shows doesn't match values to keys though:

sex = [1,2,0,1]
categories = {'Male':1, 'Female':2, 'Unknown':0}

pd.Categorical.from_codes(sex, categories=categories)

returns an array that has incorrectly mapped codes and categories

[Unknown, Male, Female, Unknown]
Categories (3, object): [Female, Unknown, Male]

Using a non-sequential numerical ordering for the codes fails with a dictionary

sex = [1,2,9,1]
categories = {'Male':1, 'Female':2, 'Unknown':9}

pd.Categorical.from_codes(sex, categories=categories)

Fails with ValueError: codes need to be between -1 and len(categories)-1

at line 386 in categorical.py at if len(codes) and (codes.max() >= len(categories) or codes.min() < -1)

presumably because the second logical element.

I'm guessing it's meant to be a cheque on the count of the number of unique elements in codes being greater than or equal to the number of items in categories.

Comment From: jreback

a dict is not index-like (which is list-like). it will be coerced withlist(dict)`` which yields the keys. not sure why you would pass that.

Comment From: ChristopherShort

Great thanks - though the docs say 'index like'.

perhaps I should try and make a correction to the docs? And put an example in to. (I'll see if I can figure it out).

Comment From: jreback

index like is correct. but an index is again not a dict, it is very much like a list

Comment From: ChristopherShort

ahh... pandas index - silly me - thanks.

Comment From: jreback

@ChristopherShort as an aside. generally you shouldn't be providing codes. yes this is a public method, but only in cases where you already have the codes (eg. say you are coding incrementally), should you use this.

Comment From: ChristopherShort

Thanks - makes sense.

My use case here was a dataset with 20 million obs - several category variables are already integer coded. One in particular variables takes 358 different values on a range of ints from 10 to 998.

The method struck me as being able to map those integers to those categories for potentially some display purposes in a notebook (there are other ways to do what I want anyway)

It was my silly error on reading index like and thinking indexable python objects instead of pandas index.

Again - thanks for taking the time here - and also a quick note to mention my appreciation for your tutorials on pandas performance and developments - they have really helped me. (And to all those that make pandas a fantastic tool)

Comment From: jreback

@ChristopherShort gr8. glad its working for you.