Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

Some methods (e.g. groupby) have the option to not end up with a column as the index.

This could be added to value_counts and pivot_table

Feature Description

as_index argument such that

df.value_counts().reset_index()

and

df.value_counts(as_index=False)

return the same value

The default as_index would still be True, but it could be set as False under an option

Additional Context

No response

Comment From: phofl

Could you try to collect all methods where this would be necessary?

Comment From: MarcoGorelli

wait sorry @lucaslarios , still not 100% sure we want this in pandas, I should've added "needs discussion" label - I've added it now

Comment From: lucaslarios

ok instead of that I will take another issue.

Comment From: wany-oh

I think it is a big change the return value of value_counts() will be changed from Series to DataFrame.

Comment From: MarcoGorelli

true, but the same thing already happens with e.g. df.groupby('col1', as_index=False)['col2'].sum()

Comment From: rhshadrach

What is the benefit of adding as_index to these methods? It seems to me this increases the surface area of the pandas API (and doubles the amount we need to test with these methods) without adding much value.

Comparing with groupby, the as_index argument is part of the groupby which produces a DataFrameGroupBy/SeriesGroupBy that can be repeated. E.g.

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]})
gb = df.groupby('a', as_index=False)
print(gb.sum())
print(gb.mean())
print(gb.prod())

With groupby, you would have to add .reset_index() to each of these calls. This reuse case does not exist for value_counts nor pivot_table. And even with this, some also think we should remove as_index from groupby: #35860

Comment From: MarcoGorelli

The benefit would be within the larger scope of having an option to make Index opt-in, as it would help fulfill the point of "nobody would get an Index unless they ask for one".

This wouldn't be the default pandas behaviour, it'd be behind an option, but it'd mean that someone could opt in to a "I don't want to think about indices" mode

Comment From: rhshadrach

I see - thanks, I missed the idea of making as_index=False the default in the OP. Echoing https://github.com/pandas-dev/pandas/issues/48880#issuecomment-1265607444:

But in general I think that pandas users are better off finding natural row labels for their data than thinking about how to get rid of it.

I'll even go stronger in saying that, in my opinion, natural row labels is what makes pandas great. As such I feel moving toward "nobody would get an Index unless they ask for one" is moving in the wrong direction.

Comment From: MarcoGorelli

Thanks, I've reworded it a bit now - rather than eventually making as_index=False the default (to which there's too much pushback), this would just enable creation of an option where that would be the default

I also love row labels (I especially like having time series where the index is time, and each column represents a numeric quantity), but I do think there's a case to be made of a "no index by default" mode - I've tried putting some points together here https://hackmd.io/JPWJqwc1SZKz_Zaxe9MZRQ#Why (you've seen the doc before, this is just an updated version, expanding on the "why?" part)

Comment From: rhshadrach

but I do think there's a case to be made of a "no index by default" mode

My understanding is the reason for this issue would be to move toward that, but there is no consensus yet as to whether we should move toward that (e.g. #48880). Is this accurate?

Comment From: MarcoGorelli

yup, totally accurate. I'll put it on the agenda for the next call