Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df1 = pd.DataFrame(columns=['Start','End','Duration'])
df2 = pd.DataFrame(np.array([None] * 3).reshape(-1,3),columns=['Start','End','Duration'])
df2['Start'] = 1
df2['End'] = 10
df2['Duration'] = 0.8
df1 = pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()]
Problem description
When concatenating the 2 dataframes df1 and df2, the type of "Start" and "End" changes from integer to float.
Expected Output
df1 Out[1]: Start End Duration 0 1 10 0.8
Output of pd.show_versions()
df1 Out[2]: Start End Duration 0 1.0 10.0 0.8
Comment From: jorisvandenbossche
What version of pandas are you using? (please provide the output of pd.show_versions
in the top post, as is asked)
I don't see this anymore in pandas 0.20.2, so possibly this has been fixed already.
Comment From: gabboshow
u'0.18.1'
Comment From: jreback
In [47]: df1 = pd.DataFrame(columns=['Start','End','Duration'])
...: df2 = pd.DataFrame(np.array([None] * 3).reshape(-1,3),columns=['Start','End','Duration'])
...: df2['Start'] = 1
...: df2['End'] = 10
...: df2['Duration'] = 0.8
...:
...: #df1 = pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()]
...:
In [48]: df1.dtypes
Out[48]:
Start object
End object
Duration object
dtype: object
In [49]: df2.dtypes
Out[49]:
Start int64
End int64
Duration float64
dtype: object
In [50]: pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()]
Out[50]:
Start End Duration
0 1 10 0.8
In [51]: pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()].dtypes
Out[51]:
Start object
End object
Duration float64
dtype: object
We are turning these into object
columns in master. I am not sure there is a whole lot more we can do here. We don't normally do an inference on the combined column; it generally is too expensive to do this, we simply find a common dtype.
if you are concatting ints and empties you are doing this purposefully. maybe we could avoid in this case, if someone wants to have a look.
Will mark this as a usage question for now.
Comment From: TomAugspurger
We are turning these into object columns in master. I am not sure there is a whole lot more we can do here
Agreed. If you want to work with empty DataFrames, you'll need to specify the dtypes yourself
In [7]: df1 = pd.DataFrame({"Start": pd.Series([], dtype=np.int64),
...: "End": pd.Series([], dtype=np.int64),
...: "Duration": pd.Series([], dtype=np.float64)})
...:
In [10]: pd.concat([df1, df2], ignore_index=True).dtypes
Out[10]:
Duration float64
End int64
Start int64
dtype: object