I propose adding a pipe method to GroupBy objects, having the same interface as DataFrame.pipe
/Series.pipe
.
The use case is reusing a GroupBy object when doing calculations, see use case example below.
Use case
A pipe is useful for succintly reusing Groupby objects in calculations, for example calculating prices given a column of revenue and a columns of quantity sold:
>>> from numpy.random import choice, random
>>> n = 100_000
>>> df = pd.DataFrame({'Store': choice(['Store_1', 'Store_2'], n),
'Year': choice(['Year_1', 'Year_2', 'Year_3', 'Year_4'], n),
'Revenue': (np.random.random(n)*50+10).round(2),
'Quantity': np.random.randint(1, 10, size=n)})
>>> df.head(2)
Quantity Revenue Store Year
0 2 14.69 Store_1 Year_1
1 9 25.89 Store_2 Year_4
Then having .pipe
, we could for example get prices per store/year like so
>>> (df.groupby(['Store', 'Year'])
... .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
... .unstack().round(2))
Year Year_1 Year_2 Year_3 Year_4
Store
Store_1 6.99 6.99 7.01 6.92
Store_2 6.95 6.98 6.97 6.96
Note that the above is vectorized and piping makes the code succint and clear.
Alternatives to .pipe
The alternatives would be:
1. use .apply
,
2. create a function and call that with a GroupBy object as its argument
3. Create a price column
Option 1 is not good because of slowness.
A pipe is just syntactic sugar for option 2, but would piping be more readable, especially it you're piping other stuff already.
Creating a concrete calculated column is in some instances the right approach, but in other cases it is better to calculate stuff.
Comment From: TomAugspurger
There was an earlier issue about this in https://github.com/pandas-dev/pandas/issues/10353, and a stalled PR in https://github.com/pandas-dev/pandas/pull/10466.
Comment From: topper-123
Thanks for bringing that up, that PR would indeed solve this issue.
I can wait a bit to see if there any opinions on this issue, but from the looks of those two issues you mentioned, it seems like there is positivity towards doing this.
10466 looks quite good overall, do you agree? I could take it the last few steps (seems mostly to be doc and rebasing issues), if there's no objections.
Comment From: jreback
no objection here
Comment From: jreback
closing as a duplicate; @topper-123 impl would be great, you can use the other issue number.