I'm trying to trim a very big DataFrame with a MultiIndex using head, and I noticed that the new DataFrame was still ridicolously big because it kept all the previous data from the entire MultiIndex. This doesn't happen if the DataFrame has a regular Index.

In [1]: from pandas import DataFrame

In [2]: df = DataFrame({'a': [1, 2, 3, 4, 5, 6], 'b': ['q', 'w', 'e', 'r', 't', 'y'], 'c': ['a', 's', 'd', 'f', 'g', 'h']}).set_index(['a', 'b'])

In [3]: df.head(3)
Out[3]: 
     c
a b   
1 q  a
2 w  s
3 e  d

In [4]: df.head(3).index
Out[4]: 
MultiIndex(levels=[[1, 2, 3, 4, 5, 6], [u'e', u'q', u'r', u't', u'w', u'y']],
           labels=[[0, 1, 2], [1, 4, 0]],
           names=[u'a', u'b'])
#### Expected Output
MultiIndex(levels=[[1, 2, 3], [u'e', u'q', u'w']],
           labels=[[0, 1, 2], [1, 2, 0]],
           names=[u'a', u'b'])

Output of pd.show_versions()

# Paste the output here ## INSTALLED VERSIONS commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 pandas: 0.18.1 nose: None pip: 8.1.2 setuptools: 25.1.1 Cython: None numpy: 1.11.1 scipy: 0.18.0 statsmodels: None xarray: None IPython: 5.0.0 sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None

Comment From: jreback

duplicate of this: https://github.com/pandas-dev/pandas/issues/11724

indexing a multi-index does not delete unused levels on purpose. It is a bit expensive to reconstruct them and it is not clear when to do that. This is an implementation detail (e.g. peering into the MultiIndex).