Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> df = pd.DataFrame({'a': [True, True, False, False], 'b': [True, False, True, False], 'c': [1,2,3,4]})
>>> df
       a      b  c
0   True   True  1
1   True  False  2
2  False   True  3
3  False  False  4
>>> df.set_index(['a']).loc[True]
          b  c
a
True   True  1
True  False  2
>>> df.set_index(['a', 'b']).loc[True]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kale/hacking/forks/pandas/pandas/core/indexing.py", line 1071, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/kale/hacking/forks/pandas/pandas/core/indexing.py", line 1306, in _getitem_axis
    self._validate_key(key, axis)
  File "/home/kale/hacking/forks/pandas/pandas/core/indexing.py", line 1116, in _validate_key
    raise KeyError(
KeyError: 'True: boolean label can not be used without a boolean index'

Issue Description

The above snippet creates a data frame with two boolean columns. When just one of those columns is set as the index, .loc[] can select rows based on a boolean value. When both columns are set as the index, though, the same value causes .loc[] to raise a KeyError. The error message makes it seems like pandas doesn't regard a multi-index as a "boolean index", even if the columns making up the index are all boolean.

In pandas 1.4.3, I can work around this by indexing with 1 and 0 instead of True and False. But in 1.5.0 it seems to be an error to provide a integer value to a boolean index.

Expected Behavior

I would expect to be able to use boolean index values even when those values are part of a multi-index:

>>> df.set_index(['a', 'b']).loc[True]
       c
b
True   1
False  2

Installed Versions

INSTALLED VERSIONS ------------------ commit : 74f4e81ca84a290e2ef617dcbf220776c21f6bca python : 3.10.0.final.0 python-bits : 64 OS : Linux OS-release : 5.18.10-arch1-1 Version : #1 SMP PREEMPT_DYNAMIC Thu, 07 Jul 2022 17:18:13 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.0.dev0+1134.g74f4e81ca8 numpy : 1.22.0 pytz : 2021.3 dateutil : 2.8.2 setuptools : 57.4.0 pip : 22.1.2 Cython : 0.29.30 pytest : 7.1.1 hypothesis : 6.39.4 sphinx : 4.3.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.8.0 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.0.1 pandas_datareader: None bs4 : 4.10.0 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 snappy : None sqlalchemy : 1.4.31 tables : None tabulate : 0.8.9 xarray : None xlrd : None xlwt : None zstandard : None

Comment From: dicristina

As per the user guide, in the MultiIndex case the notation .loc[True] is shorthand for .loc[(True,)]. As a workaround use the tuple notation instead:

import pandas as pd

df = pd.DataFrame({'a': [True, True, False, False], 'b': [True, False, True, False], 'c': [1,2,3,4]})
df.set_index(['a', 'b']).loc[(True,)]

This gives the expected result in 1.4.3:

>>> df.set_index(['a', 'b']).loc[(True,)]
<stdin>:1: PerformanceWarning: indexing past lexsort depth may impact performance.
       c
b       
True   1
False  2

Comment From: kalekundert

Ah, ok, I suppose I shouldn't be surprised that integrating .loc[] with multi-indexing is complicated and that there are edge cases like this. Thanks for providing a work-around, and feel free to close the issue if this isn't something you think is worth fixing.

Comment From: phofl

This should work. Thanks for the report.

Comment From: Zahlii

Piggy-backing on this, the following simple scenario also fails:

import pandas as pd

x = pd.DataFrame({True: [1,2,3]})
print(x)
print(x.loc[:, True])
 # KeyError: 'True: boolean label can not be used without a boolean index'

This is because validate key doesn't use the axis argument.

Pandas BUG: can't use .loc with boolean values in MultiIndex

Comment From: batterseapower

Probably obvious, but this also doesn't work with when you try to select multiple boolean items i.e.:

df = pd.DataFrame({'a': [True, True, False, False], 'b': [True, False, True, False], 'c': [1,2,3,4]}).set_index(['a', 'b'])
df.loc[[True]]

Fails with:

IndexError: Boolean index has wrong length: 1 instead of 4

Even more surprisingly, this also fails when the thing you are indexing with is explicitly an Index -- where it feels like the intention is clearly to select by key rather than to mask the array:

df.loc[pd.Index([True])]

You can work around this by using Index.get_indexer_non_unique directly:

indexer, _missing = pd.Index([True]).get_indexer_non_unique(df.index.get_level_values(0))
df.iloc[indexer >= 0]

Comment From: phofl

@batterseapower could you open another issue about this? this is different than the op, not sure how I am leaning there yet