Code Sample, a copy-pastable example if possible
import pandas
pandas.DataFrame({"a": [0], "b": [0], "c": [0]}).set_index(["a", "b"]).loc[(0, 1)]
Problem description
It should raise KeyError
because (0, 1)
doesn’t exist, like Pandas used to. I had some old code that relied on this behavior.
Currently it raises a TypeError
, which is confusing because it’s telling me that I can’t index using (int, int)
, even though it works just fine if I called .loc[(0, 0)]
.
Actual output
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
Expected Output
KeyError: 'the label [(0, 1)] is not in the [index]'
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.8-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0.dev0+27.gcfa5ea696
pytest: None
pip: 9.0.1
setuptools: 38.2.5
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Comment From: jreback
In [3]: df = DataFrame({"a": [0], "b": [0], "c": [0]}).set_index(["a", "b"])
In [4]: df
Out[4]:
c
a b
0 0 0
In [6]: df.loc[(0,1), :]
KeyError: (0, 1)
not a bug, rather you not qualifying what you are inputting, and its impossible to tell the difference. __getitem__
unpacks tuples, so .loc[(0, 1)]
is exactly equivalent to .loc[0, 1]
which is why you get the TypeError
(its trying to index with the 1 on the columns which doesn't work).
This is a limitation of how indexing works in python itself. When indexing a multindex, you should use a fully qualified form when single indexing, df.loc(axis=0)[(0,1)]
Comment From: Rufflewind
Thanks for the quick response. Seems like df.loc[(0, 1),]
can also serve as a workaround for the ambiguity.