From this (and other questions), related #4808
http://stackoverflow.com/questions/22934996/pandas-pytables-append-performance-and-increase-in-file-size
A convenience function to perform an on-disk concat from a list-like to a single table.
def concat(self, key, generator, index=True, append=False):
"""
Parameters
---------------
key : the resulting group for the table
generator : a list-like / iterator / generator to source objects
if string-like then read these from the same file name (and same key name)
index : boolean, default True,
create an index on the resulting table
append : boolean, default False
if True, append to the resulting table, otherwise remove first
"""
# remove the group if it exists (and not appending)
if key in self and not append:
self.remove(key)
for obj in generator:
# a string - treat as a file-name
if isinstance(obj, compat.string_types):
with get_store(obj, mode='r') as s:
obj = s.get(key)
self.append(key, obj, index=False)
if index:
self.create_index(columns=index)
HDF files
with get_store('concat.hdf',mode='w') as store:
store.concat('data',['file1.hdf','file2','hdf'])
Objects
with get_store('concat.hdf',mode='w') as store:
store.concat('data',[DataFrame(...), DataFrame(....)])
Generator
def f():
for f in files:
df = read_csv(f)
yield df
with get_store('concat.hdf',mode='w') as store:
store.concat('data',f)
Comment From: toddrjen
It might be good to have a **kwargs that passes any additional keyword args to get_store. That way users can specify things like compression for the resulting HDFStore.
Comment From: jreback
this already passers kwargs on - is their a problem?
Comment From: toddrjen
Sorry, I don't know what I was thinking. Please disregard.
Comment From: jreback
closing as not likely to be implemented.