Pandas combine_first breaks for MultiIndex

I am not getting the expected output of the command a.combine_first(b). All values are NaN. As you will see in my code, I want my index and columns to be the union of the indices and columns and the values to be of a and only of b if it does not exist in a or is NaN in a. Hence, I do not believe DataFrame.merge is appropriate for this task, but I may be wrong.

Here is the code to reproduce:

from __future__ import print_function
import pandas as pd
import dateutil.parser
import numpy as np

date = dateutil.parser.parse('2016-05-03').date()

a_values = [[100.0, 0.0, 0, 0, 100], [97.29, 2.70, np.nan, 0, 100], [97.33, 2.66, 0, 0, 100]]
a_index = pd.MultiIndex.from_tuples([('an', date), ('rc', date), ('avg/total', '')], names=['ls', 'date'])
a_columns = ['complete', 'no', 'over', 'partial', 'total']
a = pd.DataFrame(a_values, index=a_index, columns=a_columns)

b_values = np.arange(5 * 4).reshape(5, 4)
b_index = pd.MultiIndex.from_product([['ad', 'aj', 'an', 'rc', 'ra'], pd.Index([date])], names=['ls', 'date'])
b_columns = ['no', 'partial', 'complete', 'over']
b = pd.DataFrame(b_values, index=b_index, columns=b_columns)

c_values = [[2, 0, 3, 1, np.nan], [6, 4, 7, 5, np.nan], [100.0, 0.0, 0, 0, 100], [97.33, 2.66, 0, 0, 100], [14, 12, 15, 13, np.nan], [97.29, 2.70,\
 14, 0, 100]]
c_index = pd.MultiIndex.from_tuples([('ad', date), ('aj', date), ('an', date), ('avg/total', ''), ('ra', date), ('rc', date)], names=['ls', 'date'\
])
c_columns = ['complete', 'no', 'over', 'partial', 'total']
c = pd.DataFrame(c_values, index=c_index, columns=c_columns)

print('=========================')
print('DataFrame a:')
print(a)
print()
print('DataFrame b:')
print(b)
print()
print('=========================')
print()
print("a.index[0] == b.index[2]? {0}".format(a.index[0] == b.index[2]))
print("a.index[1] == b.index[3]? {0}".format(a.index[1] == b.index[3]))
print()
print('=========================')
print()
print("a.combine_first(b):")
print(a.combine_first(b))
print()
print("expected:")
print(c)

And here is the expected output, along with some intermediate steps:

=========================
DataFrame a:
                      complete    no  over  partial  total
ls        date                                            
an        2016-05-03    100.00  0.00     0        0    100
rc        2016-05-03     97.29  2.70   NaN        0    100
avg/total                97.33  2.66     0        0    100

DataFrame b:
               no  partial  complete  over
ls date                                   
ad 2016-05-03   0        1         2     3
aj 2016-05-03   4        5         6     7
an 2016-05-03   8        9        10    11
rc 2016-05-03  12       13        14    15
ra 2016-05-03  16       17        18    19

=========================

a.index[0] == b.index[2]? True
a.index[1] == b.index[3]? True

=========================

a.combine_first(b):
                      complete  no  over  partial  total
ls        date                                          
ad        2016-05-03       NaN NaN   NaN      NaN    NaN
aj        2016-05-03       NaN NaN   NaN      NaN    NaN
an        2016-05-03       NaN NaN   NaN      NaN    NaN
avg/total NaT              NaN NaN   NaN      NaN    NaN
ra        2016-05-03       NaN NaN   NaN      NaN    NaN
rc        2016-05-03       NaN NaN   NaN      NaN    NaN

expected:
                      complete     no  over  partial  total
ls        date                                             
ad        2016-05-03      2.00   0.00     3        1    NaN
aj        2016-05-03      6.00   4.00     7        5    NaN
an        2016-05-03    100.00   0.00     0        0    100
avg/total NaT            97.33   2.66     0        0    100
ra        2016-05-03     14.00  12.00    15       13    NaN
rc        2016-05-03     97.29   2.70    14        0    100

output of `pd.show_versions()`

In [270]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.11.0-26-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.2
nose: 1.3.3
Cython: 0.19.2
numpy: 1.8.0
scipy: 0.13.0
statsmodels: 0.5.0
IPython: 0.12.1
sphinx: None
patsy: 0.2.1
dateutil: 1.5
pytz: 2013.7
bottleneck: None
tables: 3.0.0
numexpr: 2.2.2
matplotlib: 1.4.2
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.7.2
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None

Comment From: jreback

this is not a copy-pastable example, which means copying code not output.

this might be an error, but you should be using merge anyhow.

Comment From: bcm939

Apologies. I've updated the issue, hopefully adequately.

Comment From: jreback

So this doesn't align.

In [34]: a.align(b)
Out[34]: 
(                      complete  no  over  partial  total
 ls        date                                          
 ad        2016-05-03       NaN NaN   NaN      NaN    NaN
 aj        2016-05-03       NaN NaN   NaN      NaN    NaN
 an        2016-05-03       NaN NaN   NaN      NaN    NaN
 avg/total NaT              NaN NaN   NaN      NaN    NaN
 ra        2016-05-03       NaN NaN   NaN      NaN    NaN
 rc        2016-05-03       NaN NaN   NaN      NaN    NaN,
                       complete  no  over  partial  total
 ls        date                                          
 ad        2016-05-03       NaN NaN   NaN      NaN    NaN
 aj        2016-05-03       NaN NaN   NaN      NaN    NaN
 an        2016-05-03       NaN NaN   NaN      NaN    NaN
 avg/total NaT              NaN NaN   NaN      NaN    NaN
 ra        2016-05-03       NaN NaN   NaN      NaN    NaN
 rc        2016-05-03       NaN NaN   NaN      NaN    NaN)

The problem is you are using datetime.date, which is barely a supported type in pandas. It is non-performant and just makes a mess of things. just don't use it. This works correctly when Timestamps are used.

In [14]: date
Out[14]: datetime.date(2016, 5, 3)

In [15]: date = Timestamp(date)

In [16]: date
Out[16]: Timestamp('2016-05-03 00:00:00')

In [17]: a_index = pd.MultiIndex.from_tuples([('an', date), ('rc', date), ('avg/total', '')], names=['ls', 'date'])

In [18]: b_index = pd.MultiIndex.from_product([['ad', 'aj', 'an', 'rc', 'ra'], pd.Index([date])], names=['ls', 'date'])

In [19]: a = pd.DataFrame(a_values, index=a_index, columns=a_columns)

In [20]: b = pd.DataFrame(b_values, index=b_index, columns=b_columns)

In [21]: a.align(b)
Out[21]: 
(                      complete    no  over  partial  total
 ls        date                                            
 ad        2016-05-03       NaN   NaN   NaN      NaN    NaN
 aj        2016-05-03       NaN   NaN   NaN      NaN    NaN
 an        2016-05-03    100.00  0.00   0.0      0.0  100.0
 avg/total NaT            97.33  2.66   0.0      0.0  100.0
 ra        2016-05-03       NaN   NaN   NaN      NaN    NaN
 rc        2016-05-03     97.29  2.70   NaN      0.0  100.0,
                       complete    no  over  partial  total
 ls        date                                            
 ad        2016-05-03       2.0   0.0   3.0      1.0    NaN
 aj        2016-05-03       6.0   4.0   7.0      5.0    NaN
 an        2016-05-03      10.0   8.0  11.0      9.0    NaN
 avg/total NaT              NaN   NaN   NaN      NaN    NaN
 ra        2016-05-03      18.0  16.0  19.0     17.0    NaN
 rc        2016-05-03      14.0  12.0  15.0     13.0    NaN)

Pandas combine_first breaks for MultiIndex

output of pd.show_versions()

output of `pd.show_versions()`