Looks like I can append()
to an HDFStore
object a DataFrame
with a MultiIndex
, but not a Panel
with a MultiIndex
. Similarly put
ing the Panel
with format='fixed'
works, but with format='table'
doesn't. So the problem appears to be with the table format.
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
IPython 2.2.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: from pandas import MultiIndex, DataFrame, Panel, HDFStore, show_versions
In [2]: mi = MultiIndex.from_tuples([('A','a'), ('A','b')])
In [3]: df = DataFrame([0., 1.], index=mi, columns=['X'])
In [4]: p = Panel([[[0, 1]]], items=['X'], major_axis=['Y'], minor_axis=mi)
In [5]: store = HDFStore('foo.h5')
In [6]: store.append('df', df)
In [7]: store.append('p', p)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-46ccb1da2f9f> in <module>()
----> 1 store.append('p', p)
C:\Python34\lib\site-packages\pandas\io\pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
909 kwargs = self._validate_format(format, kwargs)
910 self._write_to_group(key, value, append=append, dropna=dropna,
--> 911 **kwargs)
912
913 def append_to_multiple(self, d, value, selector, data_columns=None,
C:\Python34\lib\site-packages\pandas\io\pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
1268
1269 # write the object
-> 1270 s.write(obj=value, append=append, complib=complib, **kwargs)
1271
1272 if s.is_table and index:
C:\Python34\lib\site-packages\pandas\io\pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
3603 self.create_axes(axes=axes, obj=obj, validate=append,
3604 min_itemsize=min_itemsize,
-> 3605 **kwargs)
3606
3607 if not self.is_exists:
C:\Python34\lib\site-packages\pandas\io\pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
3168 index_axes_map[i] = _convert_index(
3169 a, self.encoding, self.format_type
-> 3170 ).set_name(name).set_axis(i)
3171 else:
3172
C:\Python34\lib\site-packages\pandas\io\pytables.py in _convert_index(index, encoding, format_type)
4082
4083 if isinstance(index, MultiIndex):
-> 4084 raise TypeError('MultiIndex not supported here!')
4085
4086 inferred_type = lib.infer_dtype(index)
TypeError: MultiIndex not supported here!
In [13]: store.put('f', p, format='fixed')
In [14]: store.put('t', p, format='table')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-f6e096e264c1> in <module>()
----> 1 store.put('t', p, format='table')
C:\Python34\lib\site-packages\pandas\io\pytables.py in put(self, key, value, format, append, **kwargs)
816 format = get_option("io.hdf.default_format") or 'fixed'
817 kwargs = self._validate_format(format, kwargs)
--> 818 self._write_to_group(key, value, append=append, **kwargs)
819
820 def remove(self, key, where=None, start=None, stop=None):
C:\Python34\lib\site-packages\pandas\io\pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
1268
1269 # write the object
-> 1270 s.write(obj=value, append=append, complib=complib, **kwargs)
1271
1272 if s.is_table and index:
C:\Python34\lib\site-packages\pandas\io\pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
3603 self.create_axes(axes=axes, obj=obj, validate=append,
3604 min_itemsize=min_itemsize,
-> 3605 **kwargs)
3606
3607 if not self.is_exists:
C:\Python34\lib\site-packages\pandas\io\pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
3168 index_axes_map[i] = _convert_index(
3169 a, self.encoding, self.format_type
-> 3170 ).set_name(name).set_axis(i)
3171 else:
3172
C:\Python34\lib\site-packages\pandas\io\pytables.py in _convert_index(index, encoding, format_type)
4082
4083 if isinstance(index, MultiIndex):
-> 4084 raise TypeError('MultiIndex not supported here!')
4085
4086 inferred_type = lib.infer_dtype(index)
TypeError: MultiIndex not supported here!
In [9]: show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.14.1
nose: 1.3.4
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
Comment From: jreback
I don't think this is possible in the current impl
indexes are flattened in 2-d (equivalent of panel.to_frame())
not sure what a multi-index would be flattens too
Comment From: seth-p
Perhaps Panel could support a multi-index on one or two of its axes, i.e. one(s) hat are not flattened?
On Thursday, September 25, 2014 12:39 PM, jreback <notifications@github.com> wrote:
I don't think this is possible in the current implindexes are flattened in 2-d (equivalent of panel.to_frame())not sure what a multi-index would be flattens too— Reply to this email directly or view it on GitHub.
Comment From: jreback
you can try converting to a 4-d panel which is pretty equivalent (though methods don't exists to do this automatically)
Comment From: seth-p
With the Panel p
as above, with minor_axis
being a multi-index, store.put('t', p.swapaxes(0,2), format='table')
works, but store.put('t', p.swapaxes(1,2), format='table')
doesn't. So it seems like it's ok to have a multi-index on the items
, but not on the major_axis
or minor_axis
. Does this seem right?
Comment From: jreback
yep like #5823 the minor axes in this case are the 'columns' (eg the fixed axis in the store) it's not natural to store these names as say tuples (could be done by I suppose) so no easy way to store this
Comment From: seth-p
Ah, I see. OK, will do a workaround... Ideally should be able to deal with multi-index on all axes...
Comment From: jreback
ok will leave open, but not sure their is a good fix here.
Comment From: seth-p
Can't the code just do something like the following (obviously for every axis)? It probably could store the multi-index axis directly, rather than creating a dummy Series as the code below does.
def put_panel_with_multi_index_on_minor_axis(p)
minor_axis = p.minor_axis # save original minor_axis
minor_axis_int_index = pd.Index(range(len(minor_axis)))
minor_axis_series = pd.Series(minor_axis_int_index, index=minor_axis)
store.put('minor_axis', minor_axis_series)
p.minor_axis = minor_axis_int_index
store.put('p', p)
p.minor_axis = minor_axis # restore original minor_axis
Then obviously wouldn't need to restore the separately-stored-multi-index when loading.
Comment From: jreback
well you can simply specify axes=[0,2,1]
I think in that case.E.g. this would flatten the 0,2 axes and store them as columns, while the 1 axis are the other columns.
This is a tricky problem because:
a) you need to flatten n-1 dims; If a single axis has multiple-dims itself then what do you do?
b) you want the axes them selves to be easily searchable, so naming a column like ('A',1)
e.g. a fully-written out tuple that is a multi-index is very tricky to specify in a query type statement.
Much easier to simply not allow it, and/or have people convert to 'real' n-dim structures, e.g.
a Panel with a single axes that is a multi-index is de-facto 4D
Comment From: seth-p
A Panel supports a multiindex only on the first axis (items). So if you have only a single axis with a multiindex, you can swapaxes() to get it there. But that's hokey. What if more than one axis has a multiindex? Or you want to be able to update() along a multiindex (update() works only along the major_axis)? Why not allow all axes to have multindex, and then under-the-hood replace any multinindex with an integer index (and restore it/them when reading, obviously), as in my previous comment? Seems reasonably straightforward to me, and would think it wouldn't be hard for someone familiar with the the code and HDF5 (not me :-)).
Comment From: jreback
no, then you cannot search the axis (which is really the point of HDF5 in the first place). and currnetly the storage of this meta-data is not optimal (its a list ATM, which has a limit).
It could be done if its stored as a separate table-like. But seems quite complicated to me. The performance of these is VERY depending on how the data is stored. Its a non-trivial problem, really depending on HOW one is querying.
Comment From: seth-p
OK, maybe searching along a multiindex is a problem. But simply storing a multiindex shouldn't be. For example, I have a panel with a fixed set of multiindex columns -- and I never need to select a subset of them. Why should I need to swapaxes(0,2) (both when saving and when loading)?
Or better yet, suppose I want to store the panel p = rolling_cov(df, 10, pairwise=True)
, where df.columns
is a multiindex, as a table so that I can append() new values as I add rows to df
. (Again, assume df.columns
remains fixed, so that p.major_axis
and p.minor_axis
, which both equal df.columns
, remain fixed.) Then first I need to do p.swapaxes(0,1)
to switch the time axis from items
to major_axis
so that I can append()
along it. OK. But since the 'table' format supports a multiindex on only one axis (items), and here I would have a multiindex along two axes (items and minor_axis, after swapping axes 0 and 1), I believe that currently p
can't be saved at all using format='table'
, no matter what axes I swap.
Comment From: jreback
When I originally did this I didn't have much use for multi-indexes on Panels. (I used Panel4D) instead. Its certainly possible. Would need to have a nice 'way' to do it (e.g. storing the multi-indexes in separate sub-tables is very easy). Then you can select from then (with a bit of indirection which can be hidden).
You are welcome to take this up! Its a bit tricky, but doable. Need to have someone with a need and time to do it!
Comment From: seth-p
I have the need, but not sure about the time. :-)
Comment From: jreback
@seth-p hah.
other alteratives I have used: `Panel4D
or simply use multiple sub-tables.
I would optimize my appending direction, e.g. think of its as a simple 2-d table. You want the rows to be the biggest dimension, everything else is sort of 'labeling'.
Comment From: jreback
if you would put up a couple of examples (of what you think should work), something that can be easily copy/pasted and made into a test would be great.
Comment From: seth-p
Something like this:
import pandas as pd
import numpy as np
mi0 = pd.MultiIndex.from_tuples([('A','a'), ('B','b')], names=['UpperAB','lowerAB'])
mi1 = pd.MultiIndex.from_tuples([('U','u'), ('V','v')], names=['UpperUV','lowerUV'])
mi2 = pd.MultiIndex.from_tuples([('X','x'), ('Y','y')], names=['UpperXY','lowerXY'])
store = pd.HDFStore("foo.h5") # not sure how this is done in tests
df = pd.dataFrame(np.arange(4).reshape((2,2)), index=m0, columns=m1)
store.put("df_as_table", df, format='table') # currently this doesn't work
result = store.get("df_as_table")
assert_frame_equal(result, df)
p = pd.Panel(np.arange(8).reshape((2,2,2)), items=mi0, major_axis=mi1, minor_axis=mi2)
store.put("p_as_table", p, format='table') # currently this doesn't work
result = store.get("p_as_table")
assert_panel_equal(result, p)
store.close()
Should add test for Panel4D
, though for some reason I've never used that...
Comment From: jreback
closing as Panel deprecated