Problem description
I would like to subset the data frame with a indexer. When using MultiIndex, the results are not following requested order.
Code Sample, a copy-pastable example if possible
This works as expected when index is a simple Index
: the returned entries are ordered as requested:
df = pd.DataFrame(
{"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
"i2":[1,3,2,2,1,1,2,2,1,1,3,2],
"d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['d1',], inplace=True)
df.loc[["a","c","b"],:]
returns
# i1 i2
# d1
# a 1 1
# c 1 2 <-- 'c' precedes 'b' as requested in query
# b 1 3
However, when querying a data frame with a MultiIndex
the behaviour is inconsistent:
df = pd.DataFrame(
{"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
"i2":[1,3,2,2,1,1,2,2,1,1,3,2],
"d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['d1',"i1"], inplace=True)
df.loc[["a","c","b"],:]
returns
# i2
# d1 i1
# a 1 1
# b 1 3
# c 1 2 <-- 'c' follows 'b' unlike in query
Expected Output
# i2
# d1 i1
# a 1 1
# c 1 2
# b 1 3
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.4.1
Cython: 0.24
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.6.1
xarray: None
IPython: 5.3.0
sphinx: 1.4.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: 1.5.1
sqlalchemy: 1.0.14
pymysql: None
psycopg2: None
jinja2: 2.9.6
boto: 2.42.0
pandas_datareader: None
Comment From: jorisvandenbossche
@DSLituiev Thanks for the report Yes, this is a known issue at the moment, closing this as a duplicate of https://github.com/pandas-dev/pandas/issues/10710. Pull requests welcome!