Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
x = pd.DataFrame({"a": ["a1", "a1", "a2"], "b": ["b1", "b2", "b3"]}, dtype=pd.StringDtype())
x = x.set_index(["a", "b"])
y = pd.DataFrame({"a": ["a1"], "c": ["c1"]}, dtype=pd.StringDtype())
y = y.set_index(["a", "c"])
z = x.join(y, how="inner")
Issue Description
When joining 2 data frames with the StringDtype dtype index, the inner join gives incorrect results. The outcome of z returns a multi-index with only 1 row:
"a1", "b1", "c1"
It does work when I don't explicitly set the dtype. In that case the index dtypes are object.
Expected Behavior
The expected result for data frame z is 2 rows (just like in Pandas 1.4.x), namely:
"a1", "b1", "c1"
"a1", "b2", "c1"
Installed Versions
Comment From: JeremyVriens
I couldn't really figure out which PR broke this, but maybe @jbrockmendel you know this? I've seen some merged PRs from you in 1.5.x that might be related to this.
Comment From: phofl
commit 4248b23371a70b339a2c16b8e5caca9c2e5897f8
Author: jbrockmendel <jbrockmendel@gmail.com>
Date: Tue Jan 25 09:31:24 2022 -0800
ENH: ExtensionEngine (#45514)
cc @jbrockmendel #45514
Comment From: phofl
This hits every ea dtype, not only string
Comment From: jbrockmendel
@phofl im just catching up on an email backlog. does this still need my attention or do you have it covered in #49284?
Comment From: phofl
This is covered, thangs for getting back to me