Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns
Documentation problem
Since pandas 1.3 silently removing nuisance columns in DataFrame groupby is deprecated. https://pandas.pydata.org/docs/whatsnew/v1.3.0.html#deprecated-dropping-nuisance-columns-in-dataframe-reductions-and-dataframegroupby-operations
However, the current documentation of groupby still present the exclusion of nuisance columns as a feature. https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns
Suggested fix for documentation
It should be clearly explained in the documentation that the user should pass the argument numeric_only=True to drop explicitly non-numeric or boolean columns, and that the silent drop of nuisance columns is deprecated and will raise in the future.
Comment From: Hashcode-Ankit
Working on this
Comment From: rhshadrach
Thanks for reporting this!
Comment From: Hashcode-Ankit
Hi @arnaudlegout can you check this code. It is only taking numeric values as written in the documentation it is ignoring all categorical or object values. ```` A =['foo','bar','foo','bar','foo','bar','foo','bar']
B=['one','one','two','three','two','two','one','three']
C=[-0.575247, 0.254161,-1.143704,0.215897,1.193555, -0.077118,-0.408530, -0.862495]
D =[1.346061, 1.511763,1.627081, -0.990582,-0.441652,1.211526,0.268520,0.024580]
df =pd.DataFrame([a,b,c,d,e,f,g,h], columns= ['A','B','C','D'])
df=df.groupby("A").std()
df
``````````````````````
Ouput :
Comment From: rhshadrach
@AnkitKumar2698 - thanks for finding this, I do think this case should warn about the pending deprecation. Can you open a new issue for it?
Comment From: Hashcode-Ankit
can @rhshadrach sir guide me so that I can fix this one
Comment From: rhshadrach
can @rhshadrach sir guide me so that I can fix this one
The section highlighted in the OP should be rewritten with
- a warning of the pending deprecation of automatic exclusion; and
- instructions for users on how to exclude nuisance columns manually.
Comment From: arnaudlegout
Just found another occurrence of "Operations in general exclude missing data." on aggregation functions here: https://pandas.pydata.org/docs/user_guide/10min.html#stats
Comment From: rhshadrach
Just found another occurrence of "Operations in general exclude missing data." on aggregation functions here: https://pandas.pydata.org/docs/user_guide/10min.html#stats
Missing data is referring to NA values, which are by default skipped. I think this one is okay.
Comment From: arnaudlegout
Just found another occurrence of "Operations in general exclude missing data." on aggregation functions here: https://pandas.pydata.org/docs/user_guide/10min.html#stats
Missing data is referring to NA values, which are by default skipped. I think this one is okay.
Right, my fault.