Pandas adding columns to a dataframe fills in with null values

Should adding dataframes yield a dataframe with all the columns of the added dataframes?

First I simply create two dataframes:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A':[4,5,6,7],'B':[10,20,30,40], 'C':[100,50,-30,-50]})
new_data = np.random.rand(4,3)
df2 = pd.DataFrame(new_data, columns=['I','II','III'], index=df1.index)
df_new = df1 + df2

I thought I would get a nice new dataframe df_new, when adding df1 + df2 but it is filled with null values:

In [63]: df_new
Out[63]: 
    A   B   C   I  II  III
0 NaN NaN NaN NaN NaN  NaN
1 NaN NaN NaN NaN NaN  NaN
2 NaN NaN NaN NaN NaN  NaN
3 NaN NaN NaN NaN NaN  NaN

Problem description

Is this a problem? The add df1.add(df2) method also gives the same wrong result.

Expected Output

I thought I would get the same as using .assign() , and pd.concat() namely when:

dfnew=df1.assign(I=new_data[:,0])
dfnew=dfnew.assign(II=new_data[:,1])
dfnew=dfnew.assign(III=new_data[:,2])

dfnew =pd.concat((df1, df2), axis=1)

then the output is correct as far as I understand the workings of pandas:

In [107]: dfnew
Out[107]: 
   A   B    C         I        II       III
0  4  10  100  0.980611  0.492488  0.260895
1  5  20   50  0.607885  0.884491  0.689950
2  6  30  -30  0.068171  0.173972  0.890535
3  7  40  -50  0.348082  0.827598  0.081198

other things I don't get

Why this output:

In [79]: df1 + df2['I']
Out[79]: 
    A   B   C   0   1   2   3
0 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN

I tried also .append() method, but I get this:

In [86]: df1.append(df2)
Out[86]: 
     A     B      C         I        II       III
0  4.0  10.0  100.0       NaN       NaN       NaN
1  5.0  20.0   50.0       NaN       NaN       NaN
2  6.0  30.0  -30.0       NaN       NaN       NaN
3  7.0  40.0  -50.0       NaN       NaN       NaN
0  NaN   NaN    NaN  0.166225  0.883279  0.520274
1  NaN   NaN    NaN  0.358485  0.689133  0.757827
2  NaN   NaN    NaN  0.311398  0.015805  0.079101
3  NaN   NaN    NaN  0.948523  0.315923  0.888804

although the index is the same!?

In [89]: df1.index==df2.index
Out[89]: array([ True,  True,  True,  True], dtype=bool)

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.19.0 statsmodels: 0.8.0 xarray: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.2 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.5.3 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 boto: 2.46.1 pandas_datareader: 0.3.0.post

Comment From: TomAugspurger

Take a look at http://pandas-docs.github.io/pandas-docs-travis/dsintro.html#data-alignment-and-arithmetic

That should answer your first issue about what happened with df1 + df2. Pandas aligns on both the index and the columns

I think that maybe you want df1.add(df2, fill_value=0). But by default when pandas aligns, the fill value is NaN and NaN+anything is NaN. So everything is behaving correctly here.

Comment From: jorisvandenbossche

I think you actually want this: pd.concat([df1, df2], axis=1)