Code Sample, a copy-pastable example if possible
>>> import pandas as pd
>>> df = pd.DataFrame(data = [[1,2,3,4],[5,6,7,8]], index = [0,1], columns = pd.MultiIndex.from_product([('a0','a1'),('b0','b1')]))
>>> df.loc[:,idx[:,'b0']]
a0 a1
b0 b0
0 1 3
1 5 7
>>> df.loc[:,idx[:,'b1']]
a0 a1
b1 b1
0 2 4
1 6 8
>>> df.loc[:,idx[:,'b0']] * df.loc[:,idx[:,'b1']]
a0 a1
b0 b1 b0 b1
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
Problem description
Since Panel became deprecated, I am having to do a lot of multi-dimensional array arithmetic using multi-index DataFrames, but the multi-index does not support simple reductive arithmetic (any operation that should return a lower-dimensional answer), because all operations return the original index and columns.
If any of the arithmetic functionality of the old Panel is to be retained by the Multi-Index Dataframe, it seems that reductive operations should be natively supported. Alternatively, maybe somebody knows a clean workaround?
Expected Output
>>> # Expected output would *something* similar to this workaround.....
>>> pd.DataFrame(df.loc[:,idx[:,'b0']].values * df.loc[:,idx[:,'b1']].values, index = [0,1], columns = pd.MultiIndex.from_product([['a0','a1'],['b0*b1']]))
a0 a1
b0*b1 b0*b1
0 2 12
1 30 56
>>> # .....Or maybe returning a level-reduced set of axes.
>>> pd.DataFrame(df.loc[:,idx[:,'b0']].values * df.loc[:,idx[:,'b1']].values, index = [0,1], columns = ['a0','a1'])
a0 a1
0 2 12
1 30 56
Output of pd.show_versions()
Comment From: gfyoung
cc @jreback
Comment From: jreback
you are trying to do something a bit unnatural here.
The point of a MI is to work with the data in long-form.
In [11]: df
Out[11]:
a0 a1
b0 b1 b0 b1
0 1 2 3 4
1 5 6 7 8
In [12]: df2 = df.stack(0)
In [13]: df2
Out[13]:
b0 b1
0 a0 1 2
a1 3 4
1 a0 5 6
a1 7 8
In [14]: df2['b0'] * df2['b1']
Out[14]:
0 a0 2
a1 12
1 a0 30
a1 56
dtype: int64
you can also do this
In [18]: df2 = df.swaplevel(0, 1, axis=1).sort_index(axis=1)
In [19]: df2
Out[19]:
b0 b1
a0 a1 a0 a1
0 1 3 2 4
1 5 7 6 8
In [20]: df2['b0'] * df2['b1']
Out[20]:
a0 a1
0 2 12
1 30 56
Comment From: joseortiz3
Isn't it sort of strange behavior that the top level of an axis is treated so differently from the sub-levels? That slicing by the top level completely removes that level from the result's index, but slicing by a sub-level does not (thus leading to the errors previously mentioned)?
From the hierarchical index documentation, it was not made clear that the top level behaves differently than the sub-levels, and that slicing should really only be done on the top level. Nor does it state that "long form" will better-support arithmetic operations.
Edit: The realization that "levels are not made equal" has completely unlocked pandas for me. "Axes are not made equal" as well. For instance, a lot of Panel functions treat the "item" axes uniquely. Same with "columns" of dataframes. Same with "top level" of multi-indices. I now feel the power of pandas, but it took a lot of personal exploration to figure this out; the documentation did not really convey this (from what I've read).