First of all, if I missed a point, please feel free to comment.

Using arithmetic operations on pd.DataFrames is sometimes a mouthful. Take the following example, where columns a and b should be multiplied by the column c:

import numpy as np
import pandas as pd

np.random.seed(0)

df = pd.DataFrame(np.random.randn(3, 3), columns=list('abc'))

df[['a', 'b']] * df['c']

Apparently this doesn't work as expected. Instead one has to use either pd.Dataframe.mul(), which brings up poor legibility, or pd.Dataframe.values, which yields long lines and therefore also results in poor legibility:

# using pd.DataFrame.mul()
df[['a', 'b']].mul(df['c'], axis='index')

# This is quite short, but does not work...
df[['a', 'b']] * df[['c']].values

# .. you have to use numpy arrays instead
df[['a', 'b']].values * df[['c']].values

Surely, the last call in this example returns a numpy array, but in my case thats the only thing I'm interested in, since I'm rewrapping my data at a later stage.

I'm proposing a new short indexer for operating on values, sth like:

df.v[['a', 'b']] * df.v[['c']]

# which returns the same as
df[['a', 'b']].values * df[['c']].values

Or even more sophisticated:

df[['a', 'b']] * df.v[['c']]

# which returns the same as
df[['a', 'b']].mul(df['c'], axis='index')

Btw the same goes for all other arithmetic operators.

Comment From: jreback

  • this would expose internal implementation detail (users would have to understand numpy )
  • make code code more obscure / unreadable
  • make the api more complex (we have another indexer, what is the reason???)

Apparently this doesn't work as expected. Instead one has to use either pd.Dataframe.mul(), which bbroadcasting a multiplication is

why do you think this should work this way? The point is to align operations on the index by default

Comment From: TomAugspurger

This is basically the same as @shoyer's point in https://github.com/pandas-dev/pandas/issues/10000#issuecomment-236238297 right?

IIRC the current behavior of dataframe * series is to match the behavior of NumPy to broadcast the last index (columns)?

I think expecting

df[['a', 'b']] * df['c']

to return

In [20]: df[['a', 'b']].mul(df['c'], axis=0)
Out[20]:
          a         b
0  1.726545  0.391649
1 -2.189975 -1.825123
2 -0.098067  0.015623

is perfectly reasonable. That said, this would be a big API change, with no clear way of deprecation.

Comment From: shoyer

In my experience, the best way to write such arithmetic currently is something like (df[['a', 'b']].T * df['c']).T (which is hardly ideal).

I think this would be reasonable behavior to change for pandas 2.0 but probably not before.

I'm not excited about the proposal here, which feels like a work-around for fundamentally broken broadcasting behavior rather than a fix of the root cause.

Comment From: jreback

@shoyer if you want to create an issue for pandas 2 would be great.

closing this one as no-action in pandas 1.0

Comment From: shoyer

See https://github.com/pandas-dev/pandas2/issues/30

Comment From: skycaptain

Thanks for the discussion here.

I'm not excited about the proposal here, which feels like a work-around for fundamentally broken broadcasting behavior rather than a fix of the root cause.

My proposal was afaik a minor fix for a common problem, which people like me have now. But, I've learned, that even this addition would mean a lot of trouble/confusion to others. So, I agree with @shoyer and @jreback that this issue is reasonable, but also too profound.