I have two multi-index dataframes a_df and b_df. When I try to concat them, the operation fails, but it works if head them first to a small size.

More specifically, note the following:

> pd.concat([a, b], 
            join='outer', axis=1, verify_integrity=True).dropna()
Empty DataFrame
Columns: [budget, default_bid]
Index: []

versus the following:

> pd.concat([a.head(100), b.head(100)], 
            join='outer', axis=1, verify_integrity=True).dropna()

                          budget  default_bid
flight_uid created_at                        
0F0092Ntoi 2015-03-28  306115.26         6.00
0F0099v8iI 2015-03-28   10984.27        41.00
0F01MfxxSI 2015-03-28    3000.00         0.90
0F01SZnTfs 2015-03-28    3000.00         1.60
0F01ddgkRa 2015-03-28    1414.71         2.52
0F01ee81fL 2015-03-28       0.00         6.00
0F01f2HpD3 2015-03-28     425.00        40.00
0F01f4aW8n 2015-03-28    1575.76         1.25
0F01o6T23a 2015-03-28       0.00         9.00
0F02BZeTr6 2015-03-28   50893.42         1.50

Here are the first few entries for a_df and b_df:

> a_df.head(10)
flight_uid  created_at
0F0092Ntoi  2015-03-28    306115.26
0F0099v8iI  2015-03-28     10984.27
0F01MfxxSI  2015-03-28      3000.00
0F01SZnTfs  2015-03-28      3000.00
0F01ddgkRa  2015-03-28      1414.71
0F01ee81fL  2015-03-28         0.00
0F01f2HpD3  2015-03-28       425.00
0F01f4aW8n  2015-03-28      1575.76
0F01o6T23a  2015-03-28         0.00
0F02BZeTr6  2015-03-28     50893.42
Name: budget, dtype: float64
> b_df.head(10)
flight_uid  created_at
0F0092Ntoi  2015-03-28     6.00
0F0099v8iI  2015-03-28    41.00
0F01MfxxSI  2015-03-28     0.90
0F01SZnTfs  2015-03-28     1.60
0F01ddgkRa  2015-03-28     2.52
0F01ee81fL  2015-03-28     6.00
0F01f2HpD3  2015-03-28    40.00
0F01f4aW8n  2015-03-28     1.25
0F01o6T23a  2015-03-28     9.00
0F02BZeTr6  2015-03-28     1.50
Name: default_bid, dtype: float64

Here are the types for both:

> a_df.reset_index().dtypes
flight_uid     object
created_at     object
budget        float64
dtype: object

More specifically:

> a_df.reset_index()['created_at'][0]
datetime.date(2015, 3, 28)

This is my configuration:

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 14.1.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: 2.6 (dt dec pq3 ext lo64)

Happy to post the data anywhere if that helps.

In case it helps, it seems to work if I pre-convert the field 'created_at' (in the index) to pd.DatetimeIndex(), i.e., doing the following with both a_df and b_df:

a_df = a_df.reset_index()
a_df['created_at'] = pd.DatetimeIndex(a_df['created_at'])
a_df.set_index(['flight_uid', 'created_at'])

Comment From: jreback

the problem is that you are using datetime.date which is a non-really-supported type. Use Timestamps and proper datetime64[ns] dtypes for datetimes. Will be orders of magnitudes faster / better.

Comment From: jreback

when you say 'operation fails' what do you mean? This should actually work the way you have it, can you show a traceback?

Comment From: amelio-vazquez-reina

Thank you @jreback By failing I mean that it returns a dataframe with NaN. Sorry if that wasn't clear, at the top of my post I tried to show that if you do a dropna() you get an empty dataframe.

Comment From: jorisvandenbossche

@amelio-vazquez-reina Can you post a reproducible example? (so some runnable code that makes up some data and reproduces the problem)

Closing for now, but feel free to re-open when you can update this.