Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame.from_dict({
("foo",): [1, 2, 3],
("bar",): [5, 6, 7],
(None,): [8, 9, 0],
})
df[[ ("missingKey",) ]] # returns the NaN-labeled column [8, 9, 0] instead of raising a KeyError
Issue Description
Given a DataFrame with MultiIndex containing NaN values in its keys, then keys with missing labels on the same level as NaN values will retrieve the NaN-labeled columns if keys are passed in a list.
If multiple missing labels are passed, each of them will retrieve the None column
pd.DataFrame.from_dict({(None,): [8, 9, 0]}).loc[:, [("foo",), ("bar",)]] # returns a DF with two copies of the NaN-labeled [8, 9, 0] column
This behaviour occurs only when selection is done via a list of keys.
pd.DataFrame.from_dict({(None,): [8, 9, 0]}).loc[:, ("foo",)] # single key - raises KeyError
pd.DataFrame.from_dict({(None,): [8, 9, 0]}).loc[:, [("foo",)]] # key in a list - returns [8,9,0]
The same issue occurs for multi-level MultiIndex, as long as missing labels in the selector occur only on the same level as NaN values.
pd.DataFrame.from_dict({(None,None): [8, 9, 0]}).loc[:, [("a","a")]] # returns [8, 9, 0]
pd.DataFrame.from_dict({(None,None): [8, 9, 0]}).loc[:, [("a","b")]] # returns [8, 9, 0]
pd.DataFrame.from_dict({(None,"b"): [8, 9, 0]}).loc[:, [("a","b")]] # returns [8, 9, 0]
pd.DataFrame.from_dict({(None,"a"): [8, 9, 0]}).loc[:, [("a","b")]] # KeyError, because 2nd level didn't match
None
can also be replaced with any of pd.NA
, np.nan
, pd.NaT
for the same effect.
An additional, more complex example, with multiple None
in index.
df = pd.DataFrame.from_dict({
(None,None): [1, 2, 3],
(None,"a"): [5, 6, 7],
("b",None): [8, 9, 0],
})
df[[("foo", "bar")]] # returns [1,2,3]
df[[("foo", "a")]] # returns [5,6,7]
df[[("b", "foo")]] # returns [8,9,0]
df[[("b", "a")]] # raises KeyError
Interestingly enough, if [("a", "b")]
is used to select from this DF, a KeyError will be raised, instead of matching it to (None, None).
Expected Behavior
A KeyError should be raised, as it would be if the key with missing label was not wrapped in a list.