xref #9959
When subsetting the columns of a dataframe after a groupby to a series:
df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]}).set_index('a')
g = df.groupby('a')['b']
the public-facing groupby arguments axis
and as_index
are not passed on to the resulting SeriesGroupBy
.
I think the following two examples should probably behave the same:
g = df.groupby('a', as_index=False)['b']
g2 = df['b'].groupby('a', as_index=False)
but the first produces
b
0 2
And the 2nd deliberately raises TypeError: as_index=False only valid with DataFrame
.
Note that the arguments axis
and as_index
are only meant for DataFrames, and as such, I think a line such as
g = df.groupby('a', as_index=False)['b']
should raise.
Comment From: rhshadrach
I updated the OP to only include the two attributes that would cause a potential API change.
Comment From: rhshadrach
After rethinking this, perhaps instead
df['b'].groupby('a', as_index=False)
should not raise, but rather be the same as df.groupby('a', as_index=False)['b']
I don't yet know how difficult this would be.
Comment From: jbrockmendel
@rhshadrach you've done a lot of work in this area since the OP. any chance this has been resolved?
Comment From: rhshadrach
For
df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]}).set_index('a')
g = df.groupby([0, 1], axis=1)['b']
g2 = df['b'].groupby('a', axis=1)
both lines g
and g2
raise correctly (I think this has always been the case). For as_index, SeriesGroupBy now fully (I believe) supports it but we raise in Series.groupby
if you try to pass as_index=False
. I'm in favor of allowing as_index=False
here, but that's #36507. Closing in favor of that issue.
Comment From: jbrockmendel
Looks like test_subsetting_columns_keeps_attrs has an xfail for SeriesGroupBy with axis=1, not bc the attribute is failing to be inherited but bc the subsetting operation is raising when it tries to do result = expected["b"]
Comment From: rhshadrach
I think this is expected behavior and shouldn't be an xfail. I'll move it to it's own test.