Pandas KeyError when indexing the MultiIndex of a DataFrame with ":" as first field

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame(index=pd.MultiIndex.from_tuples([['a', 'b', 'c']]), columns=[1, 2])

In [3]: df.loc['a', 'b', 'c']
Out[3]: 
1    NaN
2    NaN
Name: (a, b, c), dtype: object

In [4]: df.loc['a', 'b', :]
Out[4]: 
     1    2
c  NaN  NaN

In [5]: df.loc[:, 'b', 'c']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1397                 if key not in ax:
-> 1398                     error()
   1399             except TypeError as e:

/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
   1392                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393                                (key, self.obj._get_axis_name(axis)))
   1394 

KeyError: 'the label [b] is not in the [columns]'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-5-116a40d4994c> in <module>()
----> 1 df.loc[:, 'b', 'c']

/home/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1295 
   1296         if type(key) is tuple:
-> 1297             return self._getitem_tuple(key)
   1298         else:
   1299             return self._getitem_axis(key, axis=0)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
    790 
    791         # no multi-index, so validate all of the indexers
--> 792         self._has_valid_tuple(tup)
    793 
    794         # ugly hack for GH #836

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_tuple(self, key)
    140             if i >= self.obj.ndim:
    141                 raise IndexingError('Too many indexers')
--> 142             if not self._has_valid_type(k, i):
    143                 raise ValueError("Location based indexing can only have [%s] "
    144                                  "types" % self._valid_types)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1404                 raise
   1405             except:
-> 1406                 error()
   1407 
   1408         return True

/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
   1391                                     "key")
   1392                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393                                (key, self.obj._get_axis_name(axis)))
   1394 
   1395             try:

KeyError: 'the label [b] is not in the [columns]'

Expected Output

Either something analogous to the previous command, or all 3 should raise an error I guess.

output of `pd.show_versions()`

In [6]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: a63bd12529ff309d957d714825b1753d0e02b7fa
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.0-2-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.18.1+174.ga63bd12
nose: 1.3.7
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+111ddc0
xarray: None
IPython: 5.0.0.dev
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: 1.1.0dev
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.1.2
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: 0.2.1

Comment From: toobaz

(reminds https://github.com/pydata/pandas/issues/12827, not sure it's really related though)

Comment From: jreback

This is ambiguous indexing from a slicer and is not possible to detect. See the warning: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers

You can do this:

On [8]: df.loc[(slice(None), 'b', 'c'), :]
Out[8]: 
         1    2
a b c  NaN  NaN

In [9]: idx = pd.IndexSlice

In [12]: df.loc[idx[:, 'b', 'c'], :]
Out[12]: 
         1    2
a b c  NaN  NaN

Comment From: jorisvandenbossche

@toobaz Additional reason that this is confusing is that df.loc['a', 'b', 'c'] is actually interpreted as df.loc[('a', 'b', 'c'), :] and df.loc['a', 'b', :] as df.loc[('a', 'b'), :]. That is the reason those do work

Comment From: toobaz

@jorisvandenbossche @jreback Yes, I see.

I reported the issue because it seemed to me incoherent with the use of (for instance) lists... df.loc[['a'], 'b', 'c'] works as I would expect (i.e. is interpreted as df.loc[(['a'], 'b', 'c'), :] rather than as df.loc[(['a'],), ('b', 'c')]), and I tend to think to a slicer as equivalent to a list containing all values (and I am implicitly assuming that .loc doesn't interpret an indexer based on the content of the list).

But then, I just noticed that df.loc[['a'], 'b'] instead gives the same error (differently from df.loc['a', 'b']). Which is to me even more unexpected (I would expect that every time I replace a label with a list containing it I just raise the dimensionality of the returned object), and suggests that even with lists, I was oversimplifying.

Maybe the coherent behaviour I have in mind implies unwanted side-effects which I am missing... still the current behaviour is quite unexpected in several cases. But true, the documentation suggests one should not even try.

Pandas KeyError when indexing the MultiIndex of a DataFrame with ":" as first field

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

output of `pd.show_versions()`