Pandas BUG: multi-index joining returns wrong multiindex

Code Sample, a copy-pastable example if possible

modified from TestMergeMulti.test_join_multi_levels

import pandas as pd

household = (
    pd.DataFrame(
        dict(A=[1, 2, 3],
             B=[0, 1, 0],
             C=[19.3, 31.7, 29]),
        columns=['A', 'B', 'C'])
        .set_index('A'))
portfolio = (
    pd.DataFrame(
        dict(A=[1, 2, 2, 3, 3, 3, 4],
             d=["nl0", "nl3", "gb0",
                "gb0", "lu4", "nl5", 'EMPTY'],
             e=["ABN", "Robeco", "Royal", "Royal",
                "AAB", "Postbank", 'EMPTY'],
             f=[1.0, 0.4, 0.6, 0.15, 0.6, 0.25, 1.0]),
        columns=['A', 'd', 'e', 'f'])
        .set_index(['A', 'd']))
result = household.join(portfolio, how='inner')

print household 
     B     C
  A         
  1  0  19.3
  2  1  31.7
  3  0  29.0

print portfolio
                  e     f
  A d                    
  1 nl0         ABN  1.00
  2 nl3      Robeco  0.40
    gb0       Royal  0.60
  3 gb0       Royal  0.15
    lu4         AAB  0.60
    nl5    Postbank  0.25
  4 EMPTY     EMPTY  1.00

print result
         B     C         e     f
  A d                           
  1 nl0  0  19.3       ABN  1.00
  2 nl3  1  31.7    Robeco  0.40
    gb0  1  31.7     Royal  0.60
  3 gb0  0  29.0     Royal  0.15
    lu4  0  29.0       AAB  0.60
    nl5  0  29.0  Postbank  0.25

print result.columns
  MultiIndex(levels=[[1, 2, 3], [u'EMPTY', u'gb0', u'lu4', u'nl0', u'nl3', u'nl5']],
             labels=[[0, 1, 1, 2, 2, 2], [3, 4, 1, 1, 2, 5]],
             names=[u'A', u'd'])

Problem description

The result looks okay but I think the 'EMPTY' should be dropped from the MultiIndex.

Expected Output

  MultiIndex(levels=[[1, 2, 3], [ u'gb0', u'lu4', u'nl0', u'nl3', u'nl5']],
             labels=[[0, 1, 1, 2, 2, 2], [2, 3, 0, 0, 1, 4]],
             names=[u'A', u'd'])

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: None

Comment From: TomAugspurger

See https://github.com/pandas-dev/pandas/issues/2770. This is a detail of how multiindexes (currently) work.

You can use http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.MultiIndex.remove_unused_levels.html?highlight=remove_unused#pandas.MultiIndex.remove_unused_levels to remove the unused levels afterwards.

Pandas BUG: multi-index joining returns wrong multiindex

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`