Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
from calendar import month_abbr
import numpy as np
break_stuff = True
pd.show_versions()
midx = pd.MultiIndex.from_tuples([(yr, mo) for yr in ['2010', '2021'] for mo in [month_abbr[i] for i in range(1, 13)]], names=['MI1', 'MI2'])
if break_stuff:
midx2 = midx.set_levels([midx.levels[0].astype('string'), midx.levels[1]]) # change outer level dtype to string
else:
midx2 = midx
idx = pd.Index(['a', 'b', 'c', 'd'], name='Category')
data = np.random.rand(len(idx), len(midx))
x = pd.DataFrame(data, index=idx, columns=midx)
y = pd.DataFrame(data, index=idx, columns=midx2)
z = y - x # boom!
Issue Description
If one has identically structured data frames where one has some column names as type str and the other of type string, doing any arithmentic operations using operators (confirmed with +, -, /, *) results in the infinite recursion: RecursionError: maximum recursion depth exceeded while calling a Python object. This did not happen with 1.4.X.
The recursion happens in frame_arith_method_with_reindex.py line 366, from what i can see.
Expected Behavior
Expected behaviour is that the operators would work, and in the above example to return a dataframe with all zeros.
Installed Versions
Comment From: MarcoGorelli
Thanks @dk1010101 for the report
git bisect show:
443f2b1bee0f58e41fce11a6f90dfd069495eda6 is the first bad commit
BUG: Multiindex.equals not commutative for ea dtype (#46047)
https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=111378614
Comment From: MarcoGorelli
cc @phofl