Pandas promotion ints and empty object on concat

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df1 = pd.DataFrame(columns=['Start','End','Duration'])
df2 = pd.DataFrame(np.array([None] * 3).reshape(-1,3),columns=['Start','End','Duration'])
df2['Start'] = 1
df2['End'] = 10
df2['Duration'] = 0.8

df1 = pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()]

Problem description

When concatenating the 2 dataframes df1 and df2, the type of "Start" and "End" changes from integer to float.

Expected Output

df1 Out[1]: Start End Duration 0 1 10 0.8

Output of `pd.show_versions()`

df1 Out[2]: Start End Duration 0 1.0 10.0 0.8

# Paste the output here pd.show_versions() here u'0.18.1'

Comment From: jorisvandenbossche

What version of pandas are you using? (please provide the output of pd.show_versions in the top post, as is asked) I don't see this anymore in pandas 0.20.2, so possibly this has been fixed already.

Comment From: gabboshow

u'0.18.1'

Comment From: jreback

In [47]: df1 = pd.DataFrame(columns=['Start','End','Duration'])
    ...: df2 = pd.DataFrame(np.array([None] * 3).reshape(-1,3),columns=['Start','End','Duration'])
    ...: df2['Start'] = 1
    ...: df2['End'] = 10
    ...: df2['Duration'] = 0.8
    ...: 
    ...: #df1 = pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()]
    ...: 

In [48]: df1.dtypes
Out[48]: 
Start       object
End         object
Duration    object
dtype: object

In [49]: df2.dtypes
Out[49]: 
Start         int64
End           int64
Duration    float64
dtype: object

In [50]: pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()]
Out[50]: 
  Start End  Duration
0     1  10       0.8

In [51]: pd.concat([df1, df2], ignore_index=True)[df1.columns.tolist()].dtypes
Out[51]: 
Start        object
End          object
Duration    float64
dtype: object

We are turning these into object columns in master. I am not sure there is a whole lot more we can do here. We don't normally do an inference on the combined column; it generally is too expensive to do this, we simply find a common dtype.

if you are concatting ints and empties you are doing this purposefully. maybe we could avoid in this case, if someone wants to have a look.

Will mark this as a usage question for now.

Comment From: TomAugspurger

We are turning these into object columns in master. I am not sure there is a whole lot more we can do here

Agreed. If you want to work with empty DataFrames, you'll need to specify the dtypes yourself

In [7]: df1 = pd.DataFrame({"Start": pd.Series([], dtype=np.int64),
   ...:                     "End": pd.Series([], dtype=np.int64),
   ...:                     "Duration": pd.Series([], dtype=np.float64)})
   ...:

In [10]: pd.concat([df1, df2], ignore_index=True).dtypes
Out[10]:
Duration    float64
End           int64
Start         int64
dtype: object

Pandas promotion ints and empty object on concat

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`