Code Sample

mi = pd.MultiIndex.from_tuples([['A0', 'B0'],['A0', 'B1'],
                   ['A1', 'B0'],['A1', 'B1'],['A3', np.nan] ], names=['ia','ib'])
df = pd.DataFrame(np.arange(10).reshape(5,2), mi,columns=['bar', 'foo'])
df2=df.copy()
df2.index.set_levels(['B1','B0'],level=1,inplace=True)

df2
        bar  foo
ia ib
A0 B1     0    1
   B0     2    3
A1 B1     4    5
   B0     6    7
A3 NaN    8    9

# now sort_index to df2 will fill B0 for (A3,NaN) row, but sort_index of df wouldn't
df2.sort_index()
       bar  foo
ia ib
A0 B0    2    3
   B1    0    1
A1 B0    6    7
   B1    4    5
A3 B0    8    9

Problem description

It is suspect that the unsort level in MultiIndex DataFrame lead to NaN auto fill.

Note: I come with "unsort level in MultiIndex" DataFrame after some broadcast operation. Here is the code

def mklbl(prefix, n):
    return ["%s%s" % (prefix, i) for i in range(n)]

miindex = pd.MultiIndex.from_product([mklbl('A', 3),
                                      mklbl('B', 2),
                                      mklbl('C', 2),
                                      mklbl('D', 2)],names=['ia','ib','ic','id'])

micolumns = ['foo','bah']

dfmi = pd.DataFrame(np.arange(len(miindex) * len(micolumns))
                      .reshape((len(miindex), len(micolumns))),
                    index=miindex,
                    columns=micolumns).sort_index().sort_index(axis=1)

dfmi=dfmi.drop('A2')
# now dfmi first level index contain list of value more than it actually hold
# and it may lead to the index level change from ['A0', 'A1', 'A2'] to ['A1','A0']
bs=dfmi.loc[('A0','B0')].copy().rename({'D1':'D2'})
# this will lead to NaN in some row due to broadcast.
(dfmi+bs).sort_index()
               bah   foo
ic id ia ib
C0 D0 A0 B0    2.0   0.0
         B1   10.0   8.0
      A1 B0   18.0  16.0
         B1   26.0  24.0
   D1 A0 B0    NaN   NaN
         B1    NaN   NaN
      A1 B0    NaN   NaN
         B1    NaN   NaN
   D2 A0 NaN   NaN   NaN
C1 D0 A0 B0   10.0   8.0
         B1   18.0  16.0
      A1 B0   26.0  24.0
         B1   34.0  32.0
   D1 A0 B0    NaN   NaN
         B1    NaN   NaN
      A1 B0    NaN   NaN
         B1    NaN   NaN
   D2 A0 NaN   NaN   NaN # again the (D2,NaN,NaN) get change

Expected Output

sort_index should not change the contain of DataFrame.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.1.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.24.1 pytest: 4.1.0 pip: 10.0.1 setuptools: 39.0.1 Cython: None numpy: 1.16.2 scipy: None pyarrow: None xarray: None IPython: 7.3.0 sphinx: None patsy: None dateutil: 2.8.0 pytz: 2018.9 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.0.3 openpyxl: None xlrd: 1.2.0 xlwt: None xlsxwriter: None lxml.etree: None bs4: 4.7.1 html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Comment From: erezinman

Another, perhaps simpler example:

import pandas as 
pd import numpy as np 
idx = pd.MultiIndex(levels=[['a', 'b', 'c'], [16., 32., 4.]], codes=[[0, 1, 2, 2], [-1, -1, 1, 1]]) 
df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=idx) 
df 
# a b c 
# NaN NaN 32.0 32.0 
# 0 0 1 2 3 
# 1 4 5 6 7 
# 2 8 9 10 11 
# 3 12 13 14 15 

df.sort_index(axis=1) 
# a b c 
# 4.0 4.0 32.0 32.0 
# 0 0 1 2 3 
# 1 4 5 6 7 
# 2 8 9 10 11
# 3 12 13 14 15

Problem still persists 3 years after.