Should we deprecate using as_index in groupby? As the docs say, using as_index=False should only impact reductions. Here, my expectation is that a line such as

df.groupby(keys, as_index=False).op(...)

should always be equivalent to

df.groupby(keys, as_index=True).op(...).reset_index()

It seems quite easy for the user to do so if they desire, and this would simplify the groupby internals.

Comment From: rhshadrach

cc @jbrockmendel, @WillAyd, @jreback Any thoughts here?

Comment From: jbrockmendel

this would simplify the groupby internals

Do you have a qualitative estimate of how much simplification we'd get? I'm definitely open to the idea in principal.

Comment From: jreback

in principal this makes groupby simpler but then again it makes a slightly more verbose syntax and to be honest i wish we wouldn't actually set the index and just make it a column

so either way this is a rather big change

so -0 on deprecating

Comment From: rhshadrach

@jbrockmendel Not sure what kind of qualitative estimate one can give here; here are some that I think are on the more quantitative side. I count 13 instances of branching created within core.groupby that depend solely on as_index; not much code itself (27 lines). Also, the method _insert_inaxis_grouper_inplace could be removed. Of course, it doubles the testing when it comes to reduction methods, plus the additional testing that it does not impact non-reductions. There is also the ambiguity on what the behavior should be when it comes to a UDF and a Series/DataFrame with unique groups (is it a reduction or a transform?), but this will continue to impact other behaviors of groupby and so I don't think it should be really considered.

@jreback If we were to leave the groups as columns, I think that would necessitate the index being a RangeIndex, is that right? I personally utilize meaningful indexes in my workflows and lean heavily on pandas's alignment when combining results from various computations, which is why I find as_index=True behavior most useful. But of course I understand if others' workflows/uses are different.

Comment From: jorisvandenbossche

I agree that as_index=False and .reset_index() are basically the same, and thus it's not really needed to provide both ways.

But on the other hand, I think that as_index=False is also quite massively used. So I am personally not sure it is worth disrupting their workflow and having all those users need to update their code for something relatively minor. I mean, from a user point of view, it's not that we consider it "harmful" or bad practice code, I think, so for me that gives less reason to actually deprecate.

Comment From: rhshadrach

But on the other hand, I think that as_index=False is also quite massively used.

Thanks @jorisvandenbossche. Working on some of the as_index=False bugs, I got the sense that perhaps it isn't used all that often, and this was especially the case when I saw some others express that it should be removed in an issue. @jreback's and your responses, along with some searching on SO, have me convinced otherwise.

Going to close this issue in a week if there is no futher discussion.

Comment From: WillAyd

Reopening as I think this is worthwhile to do.

Comment From: jbrockmendel

+1. The groupby tests are made really cumbersome by the combination of options.