Pandas version checks

  • [X] I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/user_guide/groupby.html

Especifically on the line

df.groupby('A')is just syntactic sugar for df.groupby(df['A']).

A list of any of the above things.

Documentation problem

Well here is a sample on how is not just syntactic sugar. I think

test_df = pd.DataFrame({'Category': {0: 'product-availability address-confirmation input',
  1: 'registration register-data-confirmation options',
  2: 'onboarding return-start input',
  3: 'registration register-data-confirmation input',
  4: 'decision-tree first-interaction-validation options'},
 'Original_UserId': {0: '5511949551865@wa.gw.msging.net',
  1: '5511949551865@wa.gw.msging.net',
  2: '5511949551865@wa.gw.msging.net',
  3: '5511949551865@wa.gw.msging.net',
  4: '5511949551865@wa.gw.msging.net'}})

If I run test_df['Category'].eq('onboarding return-start input').groupby(test_df['Original_UserId']).cummax()

This gives a result

If I run

test_df['Category'].eq('onboarding return-start input').groupby('Original_UserId').cummax() I get keyerror

I am guessing the keyerror is because of the checking that occurs on the given object, that being whether the object contains that given column or not.

Suggested fix for documentation

I am not sure, maybe just add that the difference is that one checks whether one contains the given object series and the other does not.

Comment From: phofl

cc @jorisvandenbossche

Did we figure this out completely for CoW? Otherwise would just remove this line

Comment From: rhshadrach

@Sirmadeira

df.groupby('A')is just syntactic sugar for df.groupby(df['A']).

In the statement, the df inside and outside the groupby must be the same object; this is violated in the example you provided.

Comment From: Sirmadeira

True they are different objects.

Comment From: rhshadrach

@phofl - you good with this closed?

Comment From: phofl

yeah the CoW topic is already known and tracked, so no need to keep open here

Comment From: jorisvandenbossche

Did we figure this out completely for CoW? Otherwise would just remove this line

For future reference, the PR for CoW on that is further looking at this is https://github.com/pandas-dev/pandas/pull/50730 (which has the references to the original discussion).

The bottom line is that the "syntactic sugar" is a bit more costly to ensure for CoW