Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
Some methods (e.g. groupby) have the option to not end up with a column as the index.
This could be added to value_counts
and pivot_table
Feature Description
as_index
argument such that
df.value_counts().reset_index()
and
df.value_counts(as_index=False)
return the same value
The default as_index
would still be True
, but it could be set as False
under an option
Additional Context
No response
Comment From: phofl
Could you try to collect all methods where this would be necessary?
Comment From: MarcoGorelli
wait sorry @lucaslarios , still not 100% sure we want this in pandas, I should've added "needs discussion" label - I've added it now
Comment From: lucaslarios
ok instead of that I will take another issue.
Comment From: wany-oh
I think it is a big change the return value of value_counts()
will be changed from Series to DataFrame.
Comment From: MarcoGorelli
true, but the same thing already happens with e.g. df.groupby('col1', as_index=False)['col2'].sum()
Comment From: rhshadrach
What is the benefit of adding as_index
to these methods? It seems to me this increases the surface area of the pandas API (and doubles the amount we need to test with these methods) without adding much value.
Comparing with groupby, the as_index
argument is part of the groupby which produces a DataFrameGroupBy/SeriesGroupBy that can be repeated. E.g.
df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]})
gb = df.groupby('a', as_index=False)
print(gb.sum())
print(gb.mean())
print(gb.prod())
With groupby, you would have to add .reset_index()
to each of these calls. This reuse case does not exist for value_counts nor pivot_table. And even with this, some also think we should remove as_index
from groupby: #35860
Comment From: MarcoGorelli
The benefit would be within the larger scope of having an option to make Index opt-in, as it would help fulfill the point of "nobody would get an Index unless they ask for one".
This wouldn't be the default pandas behaviour, it'd be behind an option, but it'd mean that someone could opt in to a "I don't want to think about indices" mode
Comment From: rhshadrach
I see - thanks, I missed the idea of making as_index=False
the default in the OP. Echoing https://github.com/pandas-dev/pandas/issues/48880#issuecomment-1265607444:
But in general I think that pandas users are better off finding natural row labels for their data than thinking about how to get rid of it.
I'll even go stronger in saying that, in my opinion, natural row labels is what makes pandas great. As such I feel moving toward "nobody would get an Index unless they ask for one" is moving in the wrong direction.
Comment From: MarcoGorelli
Thanks, I've reworded it a bit now - rather than eventually making as_index=False
the default (to which there's too much pushback), this would just enable creation of an option
where that would be the default
I also love row labels (I especially like having time series where the index is time, and each column represents a numeric quantity), but I do think there's a case to be made of a "no index by default" mode - I've tried putting some points together here https://hackmd.io/JPWJqwc1SZKz_Zaxe9MZRQ#Why (you've seen the doc before, this is just an updated version, expanding on the "why?" part)
Comment From: rhshadrach
but I do think there's a case to be made of a "no index by default" mode
My understanding is the reason for this issue would be to move toward that, but there is no consensus yet as to whether we should move toward that (e.g. #48880). Is this accurate?
Comment From: MarcoGorelli
yup, totally accurate. I'll put it on the agenda for the next call