Code Sample, a copy-pastable example if possible
In [2]: df = pd.DataFrame(index=pd.MultiIndex.from_tuples([['a', 'b', 'c']]), columns=[1, 2])
In [3]: df.loc['a', 'b', 'c']
Out[3]:
1 NaN
2 NaN
Name: (a, b, c), dtype: object
In [4]: df.loc['a', 'b', :]
Out[4]:
1 2
c NaN NaN
In [5]: df.loc[:, 'b', 'c']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
1397 if key not in ax:
-> 1398 error()
1399 except TypeError as e:
/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
1392 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393 (key, self.obj._get_axis_name(axis)))
1394
KeyError: 'the label [b] is not in the [columns]'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-5-116a40d4994c> in <module>()
----> 1 df.loc[:, 'b', 'c']
/home/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
1295
1296 if type(key) is tuple:
-> 1297 return self._getitem_tuple(key)
1298 else:
1299 return self._getitem_axis(key, axis=0)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
790
791 # no multi-index, so validate all of the indexers
--> 792 self._has_valid_tuple(tup)
793
794 # ugly hack for GH #836
/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_tuple(self, key)
140 if i >= self.obj.ndim:
141 raise IndexingError('Too many indexers')
--> 142 if not self._has_valid_type(k, i):
143 raise ValueError("Location based indexing can only have [%s] "
144 "types" % self._valid_types)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
1404 raise
1405 except:
-> 1406 error()
1407
1408 return True
/home/nobackup/repo/pandas/pandas/core/indexing.py in error()
1391 "key")
1392 raise KeyError("the label [%s] is not in the [%s]" %
-> 1393 (key, self.obj._get_axis_name(axis)))
1394
1395 try:
KeyError: 'the label [b] is not in the [columns]'
Expected Output
Either something analogous to the previous command, or all 3 should raise an error I guess.
output of pd.show_versions()
In [6]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: a63bd12529ff309d957d714825b1753d0e02b7fa
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.0-2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8
pandas: 0.18.1+174.ga63bd12
nose: 1.3.7
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+111ddc0
xarray: None
IPython: 5.0.0.dev
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: 1.1.0dev
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.1.2
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: 0.2.1
Comment From: toobaz
(reminds https://github.com/pydata/pandas/issues/12827, not sure it's really related though)
Comment From: jreback
This is ambiguous indexing from a slicer and is not possible to detect. See the warning: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers
You can do this:
On [8]: df.loc[(slice(None), 'b', 'c'), :]
Out[8]:
1 2
a b c NaN NaN
In [9]: idx = pd.IndexSlice
In [12]: df.loc[idx[:, 'b', 'c'], :]
Out[12]:
1 2
a b c NaN NaN
Comment From: jorisvandenbossche
@toobaz Additional reason that this is confusing is that df.loc['a', 'b', 'c']
is actually interpreted as df.loc[('a', 'b', 'c'), :]
and df.loc['a', 'b', :]
as df.loc[('a', 'b'), :]
. That is the reason those do work
Comment From: toobaz
@jorisvandenbossche @jreback Yes, I see.
I reported the issue because it seemed to me incoherent with the use of (for instance) lists... df.loc[['a'], 'b', 'c']
works as I would expect (i.e. is interpreted as df.loc[(['a'], 'b', 'c'), :]
rather than as df.loc[(['a'],), ('b', 'c')]
), and I tend to think to a slicer as equivalent to a list containing all values (and I am implicitly assuming that .loc
doesn't interpret an indexer based on the content of the list).
But then, I just noticed that df.loc[['a'], 'b']
instead gives the same error (differently from df.loc['a', 'b']
). Which is to me even more unexpected (I would expect that every time I replace a label with a list containing it I just raise the dimensionality of the returned object), and suggests that even with lists, I was oversimplifying.
Maybe the coherent behaviour I have in mind implies unwanted side-effects which I am missing... still the current behaviour is quite unexpected in several cases. But true, the documentation suggests one should not even try.