Pandas to_hdf5 copies data, causes MemoryError

Code Sample, a copy-pastable example if possible

result.to_hdf(op.join(args.output_dir, "merged.hd5"), "data", mode='w')

Problem description

Even though my data fits in memory, I run out of memory when trying to save it to disk apparently because pandas makes copies:

Traceback (most recent call last):
  File "source/prof/merge_prof.py", line 144, in <module>
    main()
  File "source/prof/merge_prof.py", line 131, in main
    result.to_hdf(op.join(args.output_dir, "merged.hd5"), "data", mode='w')
  File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/core/generic.py", line 1138, in to_hdf
    return pytables.to_hdf(path_or_buf, key, self, **kwargs)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 270, in to_hdf
    f(store)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 264, in <lambda>
    f = lambda store: store.put(key, value, **kwargs)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 873, in put
    self._write_to_group(key, value, append=append, **kwargs)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 1315, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 2877, in write
    self.write_array('block%d_values' % i, blk.values, items=blk_items)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 2670, in write_array
    self._handle.create_array(self.group, key, value)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/file.py", line 1145, in create_array
    obj=obj, title=title, byteorder=byteorder)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/array.py", line 187, in __init__
    byteorder, _log)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/leaf.py", line 270, in __init__
    super(Leaf, self).__init__(parentnode, name, _log)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/node.py", line 266, in __init__
    self._v_objectid = self._g_create()
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/array.py", line 196, in _g_create
    nparr = array_as_internal(self._obj, flavor)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/flavor.py", line 180, in array_as_internal
    return array_of_flavor2(array, src_flavor, internal_flavor)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/flavor.py", line 133, in array_of_flavor2
    return convfunc(array)
  File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/flavor.py", line 373, in conv_to_numpy
    nparr = nparr.copy()  # copying the array makes it contiguous
MemoryError

Expected Output

no exception ;)

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.4.3.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-85-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 1.5.4 setuptools: 3.3 Cython: None numpy: 1.12.0 scipy: 0.18.1 statsmodels: 0.8.0 xarray: None IPython: None sphinx: None patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: 3.3.0 numexpr: 2.6.2 matplotlib: 2.0.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999 httplib2: 0.8 apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.5 boto: None pandas_datareader: None

Comment From: jreback

of course, the data must be re-aranged in order to write it in a format acceptable to HDF5. simply chunk-write if this is a problem.

Pandas to_hdf5 copies data, causes MemoryError

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`