Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
size = 1000*1000
idx = pd.Series([0,1,2,3]).sample(size, replace=True)
df = pd.DataFrame({'idx': idx, 'prog': np.arange(size)}).set_index(['idx', 'prog'])
# df = pd.DataFrame({'idx': idx, 'prog': np.arange(size)}).set_index('idx')
df.to_pickle('DELETEME.pkl')

df.memory_usage(deep=True, index=True) # small
loaded_df = pd.read_pickle('DELETEME.pkl')
loaded_df.memory_usage(deep=True, index=True)/2**20 # big
loaded_df.copy().memory_usage(deep=True, index=True)/2**20 # small again

Problem description

When a dataframe with a multiindex is pickled and then reloaded, its "memory_usage" increases by many times (in the example, approximately x4). If the dataframe is copied after reading, the memory_usage goes back to its original value. This does not happen with columns or with simple indices (non-multiple ones).

Current Fix

When I read pickles which I know have a MultiIndex, I copy right after reading.

Expected Output

The size of a dataframe should not change much when pickling and unpickling, or if there is a reason why it should do so specifically when working with a MultiIndex, this could be mentioned in the documentation.

Output of pd.show_versions()

commit : None python : 3.7.5.final.0 python-bits : 64 OS : Darwin OS-release : 18.6.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 0.25.2 numpy : 1.17.3 pytz : 2019.3 dateutil : 2.8.1 pip : 19.3.1 setuptools : 41.6.0.post20191030 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : None sqlalchemy : None tables : None xarray : None xlrd : None xlwt : None xlsxwriter : None

Comment From: mroeschke

Looks like this has been fixed in a more recent version of pandas so closing

In [5]: loaded_df.memory_usage(deep=True, index=True)/2**20 # big
Out[5]:
Index    12.397897
dtype: float64

In [6]: loaded_df.copy().memory_usage(deep=True, index=True)/2**20 # small again
Out[6]:
Index    12.397897
dtype: float64