Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

def ordered(categories):
    #print(categories)
    from pandas.api.types import CategoricalDtype
    return CategoricalDtype(categories=categories, ordered=True)

def unordered(categories):
    from pandas.api.types import CategoricalDtype
    return CategoricalDtype(categories=categories, ordered=False)    

u1 = pd.Series(['c', 'b', 'b', 'a'], dtype=ordered(['a', 'b', 'c']))
x1 = pd.Series(['c', 'b', 'b', 'a'], dtype=unordered(['a', 'b', 'c']))

x1.dtype == "category", u1.dtype == "category"
## True, True

x1.dtype == CategoricalDtype(None, False), u1.dtype == CategoricalDtype(None, False)
## False, False

Issue Description

Doc all categorical dtypes equal to a string "category" and CategoricalDtype(None, False) but they don't as the result above shows

Expected Behavior

x1.dtype == CategoricalDtype(None, False), u1.dtype == CategoricalDtype(None, False)
## True, True

Installed Versions

INSTALLED VERSIONS ------------------ commit : 4bfe3d07b4858144c219b9346329027024102ab6 python : 3.8.13.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Korean_Korea.949 pandas : 1.4.2 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 pip : 21.2.2 setuptools : 58.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.8.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.1 IPython : 8.2.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : fastparquet : None fsspec : None gcsfs : None markupsafe : 2.1.1 matplotlib : 3.5.1 numba : None numexpr : 2.8.1 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : 1.0.9 s3fs : None scipy : 1.8.0 snappy : None sqlalchemy : None tables : 3.7.0 tabulate : None xarray : None xlrd : 2.0.1 xlwt : None zstandard : None

Comment From: kwhkim

I read the source and I suspect the following code is the origin,

   def __eq__(self, other: Any) -> bool:
        """
        Rules for CDT equality:
        1) Any CDT is equal to the string 'category'
        2) Any CDT is equal to itself
        3) Any CDT is equal to a CDT with categories=None regardless of ordered
        4) A CDT with ordered=True is only equal to another CDT with
           ordered=True and identical categories in the same order
        5) A CDT with ordered={False, None} is only equal to another CDT with
           ordered={False, None} and identical categories, but same order is
           not required. There is no distinction between False/None.
        6) Any other comparison returns False
        """
        if isinstance(other, str):
            return other == self.name
        elif other is self:
            return True
        elif not (hasattr(other, "ordered") and hasattr(other, "categories")):
            return False
        elif self.categories is None or other.categories is None:
            # For non-fully-initialized dtypes, these are only equal to
            #  - the string "category" (handled above)
            #  - other CategoricalDtype with categories=None
            return self.categories is other.categories

The code is from line 362-385 in pandas/pandas/core/dtypes/dtypes.py.

According to the Rules for CDT equlity 3. Any CDT is equal to a CDT with categories=None regardless of ordered, it should be changed to

        elif self.categories is None or other.categories is None:
            # For non-fully-initialized dtypes, these are only equal to
            #  - the string "category" (handled above)
            #  - other CategoricalDtype with categories=None
            return True

and the documentation needs to be modified to corporate the fact that it should be True when categories of any one of the two is None, regardless of ordered

I am going to take

Comment From: simonjayhawkins

Thanks @kwhkim for the report. can you update the code sample so that it is reproducible, ordered and unordered are not defined.

Comment From: kwhkim

@simonjayhawkins oh my bad. it's just a short-cut I defined. I modified the code to include functions needed.

Comment From: topper-123

I don't think there is any good reason why any CategoricalDtype should compare equal to CategoricalDtype(None, False).

So I'd say that docs snippet is just wrong here IMO.