Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/user_guide/groupby.html
Especifically on the line
df.groupby('A')
is just syntactic sugar for df.groupby(df['A']).
A list of any of the above things.
Documentation problem
Well here is a sample on how is not just syntactic sugar. I think
test_df = pd.DataFrame({'Category': {0: 'product-availability address-confirmation input',
1: 'registration register-data-confirmation options',
2: 'onboarding return-start input',
3: 'registration register-data-confirmation input',
4: 'decision-tree first-interaction-validation options'},
'Original_UserId': {0: '5511949551865@wa.gw.msging.net',
1: '5511949551865@wa.gw.msging.net',
2: '5511949551865@wa.gw.msging.net',
3: '5511949551865@wa.gw.msging.net',
4: '5511949551865@wa.gw.msging.net'}})
If I run
test_df['Category'].eq('onboarding return-start input').groupby(test_df['Original_UserId']).cummax()
This gives a result
If I run
test_df['Category'].eq('onboarding return-start input').groupby('Original_UserId').cummax()
I get keyerror
I am guessing the keyerror is because of the checking that occurs on the given object, that being whether the object contains that given column or not.
Suggested fix for documentation
I am not sure, maybe just add that the difference is that one checks whether one contains the given object series and the other does not.
Comment From: phofl
cc @jorisvandenbossche
Did we figure this out completely for CoW? Otherwise would just remove this line
Comment From: rhshadrach
@Sirmadeira
df.groupby('A')
is just syntactic sugar fordf.groupby(df['A']).
In the statement, the df
inside and outside the groupby must be the same object; this is violated in the example you provided.
Comment From: Sirmadeira
True they are different objects.
Comment From: rhshadrach
@phofl - you good with this closed?
Comment From: phofl
yeah the CoW topic is already known and tracked, so no need to keep open here
Comment From: jorisvandenbossche
Did we figure this out completely for CoW? Otherwise would just remove this line
For future reference, the PR for CoW on that is further looking at this is https://github.com/pandas-dev/pandas/pull/50730 (which has the references to the original discussion).
The bottom line is that the "syntactic sugar" is a bit more costly to ensure for CoW