Pandas Multi-Index Dataframes do not cleanly support dimension-reducing arithmetic operations

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.DataFrame(data = [[1,2,3,4],[5,6,7,8]], index = [0,1], columns = pd.MultiIndex.from_product([('a0','a1'),('b0','b1')]))
>>> df.loc[:,idx[:,'b0']]
  a0 a1
  b0 b0
0  1  3
1  5  7
>>> df.loc[:,idx[:,'b1']]
  a0 a1
  b1 b1
0  2  4
1  6  8
>>> df.loc[:,idx[:,'b0']] * df.loc[:,idx[:,'b1']]
   a0      a1    
   b0  b1  b0  b1
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN

Problem description

Since Panel became deprecated, I am having to do a lot of multi-dimensional array arithmetic using multi-index DataFrames, but the multi-index does not support simple reductive arithmetic (any operation that should return a lower-dimensional answer), because all operations return the original index and columns.

If any of the arithmetic functionality of the old Panel is to be retained by the Multi-Index Dataframe, it seems that reductive operations should be natively supported. Alternatively, maybe somebody knows a clean workaround?

Expected Output

>>> # Expected output would *something* similar to this workaround.....
>>> pd.DataFrame(df.loc[:,idx[:,'b0']].values * df.loc[:,idx[:,'b1']].values, index = [0,1], columns = pd.MultiIndex.from_product([['a0','a1'],['b0*b1']]))
     a0    a1
  b0*b1 b0*b1
0     2    12
1    30    56
>>> # .....Or maybe returning a level-reduced set of axes.
>>> pd.DataFrame(df.loc[:,idx[:,'b0']].values * df.loc[:,idx[:,'b1']].values, index = [0,1], columns = ['a0','a1'])
   a0  a1
0   2  12
1  30  56

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here 24-Jul-17 14:21:05 DEBUG lzma module is not available 24-Jul-17 14:21:05 DEBUG Registered VCS backend: git 24-Jul-17 14:21:05 DEBUG Registered VCS backend: hg 24-Jul-17 14:21:05 DEBUG Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect 24-Jul-17 14:21:05 DEBUG Config variable 'WITH_PYMALLOC' is unset, Python ABI tag may be incorrect 24-Jul-17 14:21:05 DEBUG Config variable 'Py_UNICODE_SIZE' is unset, Python ABI tag may be incorrect 24-Jul-17 14:21:05 DEBUG Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect 24-Jul-17 14:21:05 DEBUG Config variable 'WITH_PYMALLOC' is unset, Python ABI tag may be incorrect 24-Jul-17 14:21:05 DEBUG Config variable 'Py_UNICODE_SIZE' is unset, Python ABI tag may be incorrect 24-Jul-17 14:21:05 DEBUG Registered VCS backend: svn 24-Jul-17 14:21:05 DEBUG Registered VCS backend: bzr INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: gfyoung

cc @jreback

Comment From: jreback

you are trying to do something a bit unnatural here.

The point of a MI is to work with the data in long-form.

In [11]: df
Out[11]: 
  a0    a1   
  b0 b1 b0 b1
0  1  2  3  4
1  5  6  7  8

In [12]: df2 = df.stack(0)

In [13]: df2
Out[13]: 
      b0  b1
0 a0   1   2
  a1   3   4
1 a0   5   6
  a1   7   8


In [14]: df2['b0'] * df2['b1']
Out[14]: 
0  a0     2
   a1    12
1  a0    30
   a1    56
dtype: int64

you can also do this

In [18]: df2 = df.swaplevel(0, 1, axis=1).sort_index(axis=1)

In [19]: df2
Out[19]: 
  b0    b1   
  a0 a1 a0 a1
0  1  3  2  4
1  5  7  6  8

In [20]: df2['b0'] * df2['b1']
Out[20]: 
   a0  a1
0   2  12
1  30  56

Comment From: joseortiz3

Isn't it sort of strange behavior that the top level of an axis is treated so differently from the sub-levels? That slicing by the top level completely removes that level from the result's index, but slicing by a sub-level does not (thus leading to the errors previously mentioned)?

From the hierarchical index documentation, it was not made clear that the top level behaves differently than the sub-levels, and that slicing should really only be done on the top level. Nor does it state that "long form" will better-support arithmetic operations.

Edit: The realization that "levels are not made equal" has completely unlocked pandas for me. "Axes are not made equal" as well. For instance, a lot of Panel functions treat the "item" axes uniquely. Same with "columns" of dataframes. Same with "top level" of multi-indices. I now feel the power of pandas, but it took a lot of personal exploration to figure this out; the documentation did not really convey this (from what I've read).

Pandas Multi-Index Dataframes do not cleanly support dimension-reducing arithmetic operations

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`