I propose adding a pipe method to GroupBy objects, having the same interface as DataFrame.pipe/Series.pipe.

The use case is reusing a GroupBy object when doing calculations, see use case example below.

Use case

A pipe is useful for succintly reusing Groupby objects in calculations, for example calculating prices given a column of revenue and a columns of quantity sold:

>>> from numpy.random import choice, random
>>> n = 100_000
>>> df = pd.DataFrame({'Store': choice(['Store_1', 'Store_2'], n),
                       'Year': choice(['Year_1', 'Year_2', 'Year_3', 'Year_4'], n),
                       'Revenue': (np.random.random(n)*50+10).round(2),
                       'Quantity': np.random.randint(1, 10, size=n)})
>>> df.head(2)
   Quantity  Revenue    Store    Year
0         2    14.69  Store_1  Year_1
1         9    25.89  Store_2  Year_4

Then having .pipe, we could for example get prices per store/year like so

>>> (df.groupby(['Store', 'Year'])
...    .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
...    .unstack().round(2))
Year     Year_1  Year_2  Year_3  Year_4
Store
Store_1    6.99    6.99    7.01    6.92
Store_2    6.95    6.98    6.97    6.96

Note that the above is vectorized and piping makes the code succint and clear.

Alternatives to .pipe

The alternatives would be: 1. use .apply, 2. create a function and call that with a GroupBy object as its argument 3. Create a price column

Option 1 is not good because of slowness.

A pipe is just syntactic sugar for option 2, but would piping be more readable, especially it you're piping other stuff already.

Creating a concrete calculated column is in some instances the right approach, but in other cases it is better to calculate stuff.

Comment From: TomAugspurger

There was an earlier issue about this in https://github.com/pandas-dev/pandas/issues/10353, and a stalled PR in https://github.com/pandas-dev/pandas/pull/10466.

Comment From: topper-123

Thanks for bringing that up, that PR would indeed solve this issue.

I can wait a bit to see if there any opinions on this issue, but from the looks of those two issues you mentioned, it seems like there is positivity towards doing this.

10466 looks quite good overall, do you agree? I could take it the last few steps (seems mostly to be doc and rebasing issues), if there's no objections.

Comment From: jreback

no objection here

Comment From: jreback

closing as a duplicate; @topper-123 impl would be great, you can use the other issue number.

Pandas ENH: Add .pipe to GroupBy objects

Use case

Alternatives to .pipe

10466 looks quite good overall, do you agree? I could take it the last few steps (seems mostly to be doc and rebasing issues), if there's no objections.