Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

mi1 = pd.MultiIndex.from_frame(
    pd.DataFrame(dict(i1=pd.Series(["b", "a"]), i2=1)),
)
print(mi1)

# This index is the same except the left index labels belong to a categorical dtype
cat_dt = pd.CategoricalDtype(["b", "a"], ordered=True)
mi2 = pd.MultiIndex.from_frame(
    pd.DataFrame(dict(i1=pd.Series(["b", "a"], dtype=cat_dt), i2=1))
)
print(mi2)

# These behave as expected.
print(mi1.intersection(mi1[1:]))
print(mi1.intersection(mi2[1:]))

# These do not.
print(mi2.intersection(mi1[1:]))
print(mi2.intersection(mi2[1:]))

Issue Description

From the example you can see that index intersection doesn't work properly for MultiIndexes when there is a categorical dtype. I think what's happening is that the intersection method sees that mi2 is monotonic, so it passes the array [('b', 1), ('a', 1)] to inner_join_indexer(), which gets the wrong answer, because mi2 is only monotonic with respect to the categorical dtype.

Expected Behavior

All four intersections in the example should return the following.

MultiIndex([('a', 1)],
           names=['i1', 'i2'])

Installed Versions

INSTALLED VERSIONS ------------------ commit : 91111fd99898d9dcaa6bf6bedb662db4108da6e6 python : 3.10.7.final.0 python-bits : 64 OS : Linux OS-release : 5.19.16-200.fc36.x86_64 Version : #1 SMP PREEMPT_DYNAMIC Sun Oct 16 22:50:04 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_CA.UTF-8 LOCALE : en_CA.UTF-8 pandas : 1.5.1 numpy : 1.23.4 pytz : 2022.5 dateutil : 2.8.2 setuptools : 65.5.0 pip : 22.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.5.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.0 numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.9.3 snappy : None sqlalchemy : None tables : None tabulate : None xarray : 2022.10.0 xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: deepers

I just checked out and built the main branch of pandas (2.0.0.dev0+466.g218ab0930e4 218ab09), and the example does not reproduce. The bug seems to be fixed in the main branch.

Comment From: topper-123

Hi @deepers. Thanks for the bug rapport, it's much appreciated.

I've checked this with pandas v.1.4, where this does give a wrong result and current main branch where it's all good. So I agree that this works now, but previously didn't.

Could you add a test case for this in the pandas test suite, so this won't pop up again later?

Comment From: deepers

Hi @topper-123. Thanks, I will do this. But in following the instructions for building the pandas development environment with mamba, I ran into the following error. Any ideas?

deepee@entropy ~/r/pandas (multiindex_categorical_intersection)> python -m pip install -e . --no-build-isolation --no-use-pep517                                                          (pandas-dev) 
Obtaining file:///home/deepee/repo/pandas
  Preparing metadata (setup.py) ... done
Requirement already satisfied: python-dateutil>=2.8.2 in /home/deepee/.local/opt/mambaforge/envs/pandas-dev/lib/python3.8/site-packages (from pandas==2.0.0.dev0+512.geb69d8943f) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/deepee/.local/opt/mambaforge/envs/pandas-dev/lib/python3.8/site-packages (from pandas==2.0.0.dev0+512.geb69d8943f) (2022.5)
Requirement already satisfied: numpy>=1.20.3 in /home/deepee/.local/opt/mambaforge/envs/pandas-dev/lib/python3.8/site-packages (from pandas==2.0.0.dev0+512.geb69d8943f) (1.23.4)
Requirement already satisfied: six>=1.5 in /home/deepee/.local/opt/mambaforge/envs/pandas-dev/lib/python3.8/site-packages (from python-dateutil>=2.8.2->pandas==2.0.0.dev0+512.geb69d8943f) (1.16.0)
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.5.1
    Uninstalling pandas-1.5.1:
      Successfully uninstalled pandas-1.5.1
  Running setup.py develop for pandas
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
db-dtypes 1.0.4 requires pandas<2.0dev,>=0.24.2, but you have pandas 2.0.0.dev0+512.geb69d8943f which is incompatible.
Successfully installed pandas-2.0.0.dev0+512.geb69d8943f