Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
ser_1 = pd.Series([1, 2, 3, "b"], dtype="category")
ser_2 = pd.Series([1, 3, 3, "b"], dtype="category")
ser_2 = ser_2.cat.add_categories(2)
print(f"Ser_1 categories: {ser_1.cat.categories}")
print(f"Ser_2 categories: {ser_2.cat.categories}")
ser_1.eq(ser_2)
===============================================
Ser_1 categories: Index([1, 2, 3, 'b'], dtype='object')
Ser_2 categories: Index([1, 3, 'b', 2], dtype='object')
File "/.../site-packages/pandas/core/arrays/categorical.py", line 153, in func
raise TypeError(msg)
TypeError: Categoricals can only be compared if 'categories' are the same.
Issue Description
I believe this PR may have oversimplified categorical comparisons and prevented unordered categories from being compared.
Expected Behavior
I would expect that comparing two collections of categories would not raise an error if the sets of those two categories are equivalent regardless of the order.
Installed Versions
Comment From: AlexKirko
Confirmed by reproducing. Since categories are unordered by default, I agree that there shouldn't be an error and we should just get False
here.
@ParthivNaresh Would you be willing to take on making a PR to fix this?
Comment From: ParthivNaresh
@AlexKirko For sure I'm on it!
Comment From: ParthivNaresh
take