Pandas DataFrame(recarray, columns=MultiIndex) disregards input data, gives empty DataFrame

I previously posted this as a question (not knowing it was a bug) here: http://stackoverflow.com/questions/37732403/pandas-dataframe-from-multiindex-and-numpy-structured-array-recarray

First I create a two-level MultiIndex:

import numpy as np
import pandas as pd

ind = pd.MultiIndex.from_product([('X','Y'), ('a','b')])

I can use it like this:

pd.DataFrame(np.zeros((3,4)), columns=ind)

Which gives:

     X         Y     
     a    b    a    b
0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0

But now I'm trying to do this:

dtype = [('Xa','f8'), ('Xb','i4'), ('Ya','f8'), ('Yb','i4')]
pd.DataFrame(np.zeros(3, dtype), columns=ind)

But that gives me an empty DataFrame!

Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []

I expected it to do the same thing as this:

df = pd.DataFrame(np.zeros(3, dtype))
df.columns = ind
df

Which is:

     X       Y   
     a  b    a  b
0  0.0  0  0.0  0
1  0.0  0  0.0  0
2  0.0  0  0.0  0

INSTALLED VERSIONS

commit: None python: 2.7.10.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-86-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.0 pip: 8.1.1 setuptools: 20.7.0 numpy: 1.10.0 scipy: 0.16.0 statsmodels: 0.6.1 IPython: 3.2.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.4.3

Comment From: jorisvandenbossche

This is a common pitfall: currently, passing columns in DataFrame() does a reindex and does not overwrite the columns.

If your data already has column name information, pd.DataFrame(np.zeros(3, dtype), columns=ind) does more something like:

df = pd.DataFrame(np.zeros(3, dtype))
df = df.reindex(columns=ind)

rather than the

df = pd.DataFrame(np.zeros(3, dtype))
df.columns = ind

as you expected.

So knowing this, the output you see is correct, as the reindex will not find matching column names and return an empty dataframe. There are some related issues about this, and some discussions on changing this (but the question is also whether it is worth the breaking change).

Comment From: jorisvandenbossche

xref discussion in #9237