Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame.from_dict({
    ("foo",): [1, 2, 3],
    ("bar",): [5, 6, 7],
    (None,): [8, 9, 0], 
})
df[[ ("missingKey",) ]] # returns the NaN-labeled column [8, 9, 0] instead of raising a KeyError

Issue Description

Given a DataFrame with MultiIndex containing NaN values in its keys, then keys with missing labels on the same level as NaN values will retrieve the NaN-labeled columns if keys are passed in a list.

If multiple missing labels are passed, each of them will retrieve the None column

pd.DataFrame.from_dict({(None,): [8, 9, 0]}).loc[:, [("foo",), ("bar",)]]  # returns a DF with two copies of the NaN-labeled [8, 9, 0] column 

This behaviour occurs only when selection is done via a list of keys.

pd.DataFrame.from_dict({(None,): [8, 9, 0]}).loc[:, ("foo",)]  # single key - raises KeyError
pd.DataFrame.from_dict({(None,): [8, 9, 0]}).loc[:, [("foo",)]]  # key in a list - returns [8,9,0]

The same issue occurs for multi-level MultiIndex, as long as missing labels in the selector occur only on the same level as NaN values.

pd.DataFrame.from_dict({(None,None): [8, 9, 0]}).loc[:, [("a","a")]]  # returns [8, 9, 0]
pd.DataFrame.from_dict({(None,None): [8, 9, 0]}).loc[:, [("a","b")]]  # returns [8, 9, 0]
pd.DataFrame.from_dict({(None,"b"): [8, 9, 0]}).loc[:, [("a","b")]]  # returns [8, 9, 0]
pd.DataFrame.from_dict({(None,"a"): [8, 9, 0]}).loc[:, [("a","b")]]  # KeyError, because 2nd level didn't match

None can also be replaced with any of pd.NA, np.nan, pd.NaT for the same effect.

An additional, more complex example, with multiple None in index.

df = pd.DataFrame.from_dict({
    (None,None): [1, 2, 3],
    (None,"a"): [5, 6, 7],
    ("b",None): [8, 9, 0],
})

df[[("foo", "bar")]] # returns [1,2,3]
df[[("foo", "a")]] # returns [5,6,7]
df[[("b", "foo")]] # returns [8,9,0]
df[[("b", "a")]] # raises KeyError

Interestingly enough, if [("a", "b")] is used to select from this DF, a KeyError will be raised, instead of matching it to (None, None).

Expected Behavior

A KeyError should be raised, as it would be if the key with missing label was not wrapped in a list.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 5.13.0-27-generic Version : #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8 pandas : 1.4.1 numpy : 1.22.2 pytz : 2021.3 dateutil : 2.8.2 pip : 20.0.2 setuptools : 44.0.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 8.0.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.4.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None