Should adding dataframes yield a dataframe with all the columns of the added dataframes?
First I simply create two dataframes:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A':[4,5,6,7],'B':[10,20,30,40], 'C':[100,50,-30,-50]})
new_data = np.random.rand(4,3)
df2 = pd.DataFrame(new_data, columns=['I','II','III'], index=df1.index)
df_new = df1 + df2
I thought I would get a nice new dataframe df_new
, when adding df1 + df2
but it is filled with null values:
In [63]: df_new
Out[63]:
A B C I II III
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
Problem description
Is this a problem? The add df1.add(df2)
method also gives the same wrong result.
Expected Output
I thought I would get the same as using .assign()
, and pd.concat()
namely when:
dfnew=df1.assign(I=new_data[:,0])
dfnew=dfnew.assign(II=new_data[:,1])
dfnew=dfnew.assign(III=new_data[:,2])
or
dfnew =pd.concat((df1, df2), axis=1)
then the output is correct as far as I understand the workings of pandas:
In [107]: dfnew
Out[107]:
A B C I II III
0 4 10 100 0.980611 0.492488 0.260895
1 5 20 50 0.607885 0.884491 0.689950
2 6 30 -30 0.068171 0.173972 0.890535
3 7 40 -50 0.348082 0.827598 0.081198
other things I don't get
Why this output:
In [79]: df1 + df2['I']
Out[79]:
A B C 0 1 2 3
0 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN
I tried also .append()
method, but I get this:
In [86]: df1.append(df2)
Out[86]:
A B C I II III
0 4.0 10.0 100.0 NaN NaN NaN
1 5.0 20.0 50.0 NaN NaN NaN
2 6.0 30.0 -30.0 NaN NaN NaN
3 7.0 40.0 -50.0 NaN NaN NaN
0 NaN NaN NaN 0.166225 0.883279 0.520274
1 NaN NaN NaN 0.358485 0.689133 0.757827
2 NaN NaN NaN 0.311398 0.015805 0.079101
3 NaN NaN NaN 0.948523 0.315923 0.888804
although the index is the same!?
In [89]: df1.index==df2.index
Out[89]: array([ True, True, True, True], dtype=bool)
Output of pd.show_versions()
Comment From: TomAugspurger
Take a look at http://pandas-docs.github.io/pandas-docs-travis/dsintro.html#data-alignment-and-arithmetic
That should answer your first issue about what happened with df1 + df2
. Pandas aligns on both the index and the columns
I think that maybe you want df1.add(df2, fill_value=0)
. But by default when pandas aligns, the fill value is NaN and NaN+anything is NaN. So everything is behaving correctly here.
Comment From: jorisvandenbossche
I think you actually want this: pd.concat([df1, df2], axis=1)