Pandas Is there a way to append dataframes together more memory-wised efficient?

for example, i want to concat/append two dataframes into one and only use the final dataframe later on:

` import numpy as np import pandas as pd from pandas.util import testing as tm from multiprocessing.pool import ThreadPool

num_rows = 100000

def make_df(num_rows=10000):

df = pd.DataFrame(np.random.rand(num_rows, 5), columns=list('abcde'))
df['foo'] = 'foo'
df['bar'] = 'bar'
df['baz'] = 'baz'
df['date'] = pd.date_range('20000101 09:00:00',
                           periods=num_rows,
                           freq='s')
df['int'] = np.arange(num_rows, dtype='int64')
return df

df1 = make_df(num_rows=num_rows) df2 = make_df(num_rows=num_rows) df = pd.concat([df1,df2]) `

during the last statement execution, the memory usage got doubled due to the copy assignment.

is it possible that we have some method like df1.append_but_modify_lvalue(df2,ignore_index=True) , thus we don't need to copy assign return value

Comment From: chris-b1

No, this is essentially impossible, for the same reason you can't concatenate numpy arrays without a copy - see http://stackoverflow.com/a/7869472/3657742.

If you know how big the result is going to be your best bet is probably to pre-allocate numpy array(s) of the full size, place values into it, and construct the DataFrame from the single large array.