Pandas Intermittent KeyErrors for Series with duplicates in string index

Code Sample, a copy-pastable example if possible

import pandas as pd
s1 = pd.Series(data = [1,2,3,4,5], index = [1,2,3,4,4])
s2 = pd.Series(data = [1,2,3,4,5], index = ['1', '2', '3', '4', '4'])

s1[0] # KeyError
s2[0] # KeyError

s1.iloc[:4][0] # KeyError
s1.iloc[:5][0] # KeyError

s2.iloc[:4][0] # Returns 1!
s2.iloc[:5][0] # TypeError & KeyError

Problem description

s2[0] throws a KeyError because the key, 0, is not in the index. This behavior is expected. s2.iloc[:5][0] also has this correct behavior. However, once the duplicate index has been removed, s2.iloc[:4][0], this error is not thrown and instead the value at position 0 is returned.

This error is present with string type indexes, but does not occur with integer indexes (see examples involving s1 above). I have not tried other index types.

Issue is present in a fresh conda environment with python version 3.6.2 and pandas version 0.21.1 (via conda-forge).

Expected Output

s2.iloc[:4][0] should fail with TypeError and KeyError.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.13.2.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.21.1 pytest: None pip: 9.0.1 setuptools: 36.4.0 Cython: None numpy: 1.13.1 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: chris-b1

The fundamental problem here is #9595, namely [] indexing sometimes has fallback behavior to location based indexing. My general recommendation (and I think at least soft community consensus) is to only use [] indexing for DataFrame column selection and iloc / loc everywhere else.

@jorisvandenbossche's summary does a good job, although doesn't have this particular corner (duplicate vs non-duplicate)

Comment From: jreback

indexing with duplicated indices is quite tricky. you have to pay attention, so pretty much you shouldn't avoid this. coupled with fallback things get tricky. use .loc for label selection and this problem goes away.

Pandas Intermittent KeyErrors for Series with duplicates in string index

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`