Code Sample

>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').map(lambda x: len(x) if x == x else -1)
0    6.0
1    2.0
2    NaN
dtype: category
Categories (2, int64): [6, 2]
>>> pd.Series(['Pandas', 'is', np.nan], dtype='category').astype(object).map(lambda x: len(x) if x == x else -1)
0    6
1    2
2   -1
dtype: int64

Problem description

Series.map calls its function argument once for each value in the categorical, but never calls it on NaN even if that is part of the series. This is inconsistent with how Series.map usually works, and is very surprising!

I'm raising this issue even though #15706 already exists because that issue is asking for something different (they want the argument to .map to be called once per value in the series, rather than once per unique value).

Another related issue is #20714.

Expected Output

Categorical map should give values equal to those obtained by first converting to object. For any series s and function f we should have the invariant that:

s.map(f).astype(object).equals(s.astype(object).map(f).astype(object))

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.1 pytest: 3.1.2 pip: 18.0 setuptools: 39.0.1 Cython: 0.27.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: 0.9.0 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.7.2 pytz: 2018.3 blosc: None bottleneck: 1.2.1 tables: None numexpr: None feather: None matplotlib: 2.2.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: 3.8.0 bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.1.11 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: 0.1.5 pandas_gbq: None pandas_datareader: None