Currently, the Categorical
nan value is hard-coded to np.nan
. I propose making the CategoricalDtype.na_value
take its value from CategoricalDtype.categories.dtype.na_value
if the categories is an ExtensionArray
else fall back to np.nan
.
There are various code parts in Categorical
that presume that the nan sentinel value is np.nan
. Those will have to be changed to use Categorical.dtype.na_value
instead.
Comment From: rhshadrach
Duplicate of #50711 (in particular https://github.com/pandas-dev/pandas/issues/50711#issuecomment-1422910080) I think.
Comment From: topper-123
Yes, it's the same idea is mentioned by @jorisvandenbossche. I think that's also the final conclusion for #50711 (i.e to follow the nan-behaviour of the underlying categories (np.nan for numpy arrays, pd.NA for extensionarrays).
I'm ok with closing this to avoid the duplicate.