Pandas Binary operations don't broadcast across multiindex

related #6360

Consider the following two Series:

x = pd.DataFrame({'year':[1,1,1,1,2,2,2,2],
                  'country':['A','A','B','B','A','A','B','B'],
                  'prod':[1,2,1,2,1,2,1,2],
                  'val':[10,20,15,25,20,30,25,35]})
x = x.set_index(['year','country','prod']).squeeze()

y = pd.DataFrame({'year':[1,1,2,2],'prod':[1,2,1,2],
                  'mul':[10,0.1,20,0.2]})
y = y.set_index(['year','prod']).squeeze()

which look like:

    year  country  prod
    1     A        1       10
                   2       20
          B        1       15
                   2       25
    2     A        1       20
                   2       30
          B        1       25
                   2       35

year  prod
1     1       10.0
      2        0.1
2     1       20.0
      2        0.2

I find it to be an extremely common task, to perform binary operations by distributing the values of y over a particular level of x. For example, I'd like to multiply all values of product 1 in year 1 by 10.0, regardless of country.

The required result is therefore as follows:

    year  country  prod
    1     A        1       100.0
                   2       2.0
          B        1       150.0
                   2       2.5
    2     A        1       400.0
                   2       6.0
          B        1       500.0
                   2       7.0

The binary operation .mul() doesn't work as expected:

>>> x.mul(y, level=['year','prod'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 334, in f
    return self._binop(other, op, level=level, fill_value=fill_value)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 2075, in _binop
    this, other = self.align(other, level=level, join='outer')
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 2570, in align
    return_indexers=True)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 954, in join
    return_indexers=return_indexers)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1058, in _join_level
    raise Exception('Join on level between two MultiIndex objects '
Exception: Join on level between two MultiIndex objects is ambiguous

To create the required result, the user currently has to do this:

x = x.reset_index('country').sort_index()
x.val = x.val * y
x = x.reset_index().set_index(['year',
                               'country',
                               'prod']).sortlevel()

Comment From: jreback

@roblevy can you put your solution in the top section as well (then its easy to create expected result)

Comment From: roblevy

Done!

Comment From: jreback

gr8! love for you to work on this! otherwise prob won't be tackled for a while. Its not too tricky and would get you a lot of exposure to the code.

Comment From: roblevy

:D Great

On 5 December 2013 23:18, jreback notifications@github.com wrote:

gr8! love for you to work on this! otherwise prob won't be tackled for a while. Its not too tricky and would get you a lot of exposure to the code.

— Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/5645#issuecomment-29949006 .

Comment From: ghost

I think the example result has editing errors in it, where did the 400 come from? doesn't match your description of the operation.

A very powerful operation to have added. Is there a sugary way to do it you had in mind? I like the idea anyway.

Comment From: roblevy

Hi @y-p. The example result is correct.

The 400 is x[year 2, country A, product 1] * y[year 2, product 1] = 20 * 20 = 400

Notice how the values of x are multiplied by different numbers from y depending on year and product.

As far as I'm concerned, this should "just work" when the user does x.mul(y)

Comment From: ghost

Thanks, I see now.

Comment From: 0o-de-lally

@roblevy @jreback This feature is an important component to a project i'm working on. Any chance we can get this implemented on the .14 release as well? I see that a related bug was fixed for the same release. Seems like they belong together. How can I help get this released sooner?

Comment From: jreback

@keyscores well, this is dependent on #6360 so that needs to be fixed first that said, you can simply use the soln above. Its just as efficient, if not syntax friendly, but you can wrap in a function

of course a PR would be welcom

Comment From: 0o-de-lally

@jreback I'll try the solution above. I can donate a small bounty as an incentive to move #6360 up to v0.14 , would that help? :)

Comment From: jreback

@keyscores unfortunately no....:) that IS being worked on though

Comment From: 0o-de-lally

Ok it was worth a try. At least you know you have a cheerleader for that fix. Would love to see it in .14 :)

Comment From: roblevy

This may or may not be relevant, but I've found a comparatively neat workaround for this problem: unstack the index level(s) which doesn't (don't) appear in the other Series.

In [81]: x.unstack('country').mul(y, axis=0).stack()
Out[81]: 
year  prod  country
1     1     A          100.0
            B          150.0
      2     A            2.0
            B            2.5
2     1     A          400.0
            B          500.0
      2     A            6.0
            B            7.0
dtype: float64

This is not the best possible solution since it changes the order of the index of x which is unexpected, but it should be pretty efficient.

Comment From: jreback

yes this would be quite efficient

In [31]: x.unstack('country').mul(y, axis=0).stack().reorder_levels(x.index.names)
Out[31]: 
year  country  prod
1     A        1       100.0
      B        1       150.0
      A        2         2.0
      B        2         2.5
2     A        1       400.0
      B        1       500.0
      A        2         6.0
      B        2         7.0
dtype: float64

Comment From: roblevy

Looks to me like you might (sometimes at least) need a .sortlevel() on the end of that command.

In [41]: z = pd.DataFrame(dict(country=['A', 'A', 'B', 'B'], prod=[1,2,1,2], val=[1,2,3,4]))
In [42]: z = z.set_index(['country', 'prod']).squeeze()
In [43]: x.unstack('year').mul(z, axis=0).stack().reorder_levels(x.index.names)
Out[43]: 
year  country  prod
1     A        1        10
2     A        1        20
1     A        2        40
2     A        2        60
1     B        1        45
2     B        1        75
1     B        2       100
2     B        2       140
dtype: int64

Comment From: znicholls

This appears to have been fixed somewhere, in the latest master you can do

import pandas as pd                                                                                                                                     

x = pd.DataFrame({'year':[1,1,1,1,2,2,2,2],
                  'country':['A','A','B','B','A','A','B','B'],
                  'prod':[1,2,1,2,1,2,1,2],
                  'val':[10,20,15,25,20,30,25,35]})
x = x.set_index(['year','country','prod']).squeeze()

y = pd.DataFrame({'year':[1,1,2,2],'prod':[1,2,1,2],
                  'mul':[10,0.1,20,0.2]})
y = y.set_index(['year','prod']).squeeze()

x.mul(y, axis=0)

# result below
year  prod  country
1     1     A          100.0
            B          150.0
      2     A            2.0
            B            2.5
2     1     A          400.0
            B          500.0
      2     A            6.0
            B            7.0
dtype: float64

Comment From: mroeschke

Closing as this seems addressed