Code Sample, a copy-pastable example if possible
result.to_hdf(op.join(args.output_dir, "merged.hd5"), "data", mode='w')
Problem description
Even though my data fits in memory, I run out of memory when trying to save it to disk apparently because pandas makes copies:
Traceback (most recent call last):
File "source/prof/merge_prof.py", line 144, in <module>
main()
File "source/prof/merge_prof.py", line 131, in main
result.to_hdf(op.join(args.output_dir, "merged.hd5"), "data", mode='w')
File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/core/generic.py", line 1138, in to_hdf
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 270, in to_hdf
f(store)
File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 264, in <lambda>
f = lambda store: store.put(key, value, **kwargs)
File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 873, in put
self._write_to_group(key, value, append=append, **kwargs)
File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 1315, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 2877, in write
self.write_array('block%d_values' % i, blk.values, items=blk_items)
File "/home/jgarvin/.local/lib/python3.4/site-packages/pandas/io/pytables.py", line 2670, in write_array
self._handle.create_array(self.group, key, value)
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/file.py", line 1145, in create_array
obj=obj, title=title, byteorder=byteorder)
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/array.py", line 187, in __init__
byteorder, _log)
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/leaf.py", line 270, in __init__
super(Leaf, self).__init__(parentnode, name, _log)
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/node.py", line 266, in __init__
self._v_objectid = self._g_create()
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/array.py", line 196, in _g_create
nparr = array_as_internal(self._obj, flavor)
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/flavor.py", line 180, in array_as_internal
return array_of_flavor2(array, src_flavor, internal_flavor)
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/flavor.py", line 133, in array_of_flavor2
return convfunc(array)
File "/home/jgarvin/.local/lib/python3.4/site-packages/tables/flavor.py", line 373, in conv_to_numpy
nparr = nparr.copy() # copying the array makes it contiguous
MemoryError
Expected Output
no exception ;)
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-85-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2
nose: None
pip: 1.5.4
setuptools: 3.3
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.8.0
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None
Comment From: jreback
of course, the data must be re-aranged in order to write it in a format acceptable to HDF5. simply chunk-write if this is a problem.