We allow for partially-initialized CategoricalDtypes to compare as equal to fully-initialized ones, leading to some surprising behaviors. The one that I stumbled over was this:
cat = pd.Categorical(["a", "b", "c"], ordered=True)
dtype = pd.CategoricalDtype()
>>> cat.dtype == dtype
True
cat2 = cat.astype(dtype)
>>> cat2.dtype == cat.dtype
False
We should stop special-casing categories=None for this purpose.
Comment From: jreback
this was for back compat
iirc there should be some issue refs in the dtypes def
but agreed we should fix this
Comment From: jbrockmendel
The other surprise I found when going to fix this was that pandas_dtype("categorical")
has ordered=None
while CategoricalDtype()
has ordered=False
Comment From: jorisvandenbossche
I think in general I think this "uninitialized" CategoricalDtype object is to support the general "categories"
dtype. So if we want to make the dtype more strict, we will need to come up with an alternative way to support this.
(that's not to say that there might be something buggy in the specific example case you show, but that's for astyping an array that is already categorical)
Comment From: jbrockmendel
Closed by #38516