Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame({"key": ["a", "b", "c", "b", "a"], "value": [False, False, True, False, True]}, index=[7, 2, 5, 3, 9])
df
Out[4]:
key value
7 a False
2 b False
5 c True
3 b False
9 a True
df.groupby("key", as_index=False).transform("max")
Out[5]:
key value
7 NaN NaN
2 NaN NaN
5 NaN NaN
3 NaN NaN
9 NaN NaN
df.groupby("key", as_index=True).transform("max")
Out[6]:
value
7 True
2 False
5 True
3 False
9 True
Issue Description
When using transform("max") after groupby with as_index=False, all entries in the original dataframe are converted to NaN. This is not only the case for "max" but also for "min" as well as "sum" or "mean" when having integers in the "value" column.
When using as_index=True instead, the transform works as expected, however, the "key" column does not appear as an index but simply disappears from the dataframe.
Expected Behavior
- I would expect that when using as_index=False only the entries in the "value" column change, i.e. change to
True False True False True
- I would expect that when using as_index=True the "key" column appears as an index.
Installed Versions
Comment From: phofl
This worked in 1.4.4, but I don't think that this should work. max is a reducer and should be used with agg and not transform.
cc @rhshadrach Can you weigh in?
Comment From: asishm
related : https://github.com/pandas-dev/pandas/issues/37093
Comment From: rhshadrach
Thanks for the report! Haven't run a git bisect but it will undoubtedly point to one of my PRs. Should be able to get a fix in for 1.5.3.
@phofl
I don't think that this should work. max is a reducer and should be used with agg and not transform.
When given a reducer, transform will compute the result and then broadcast it to each group. E.g.
df = pd.DataFrame({"a": [1, 1, 2, 2, 2], "b": range(5)})
result = df.groupby("a", as_index=True).transform("max")
print(result)
# b
# 0 1
# 1 1
# 2 4
# 3 4
# 4 4
Comment From: rhshadrach
@Verena1695
2. I would expect that when using as_index=True the "key" column appears as an index.
as_index
only has an effect (or at least, should :laughing:) when doing an aggregation, not a transform; see the documentation here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
The result you want can be achieved by
df = pd.DataFrame({"key": ["a", "b", "c", "b", "a"], "value": [False, False, True, False, True]}, index=[7, 2, 5, 3, 9])
transformed = df.groupby("key").transform("max")
result = pd.concat([df["key"], transformed], axis=1)
Comment From: rhshadrach
I would expect that when using as_index=True the "key" column appears as an index.
@Verena1695 - we've been discussing if there should be an option in groupby for users to get the groups for a transform in the index (or columns) of the result. The question we have is if there are use cases for this - I've never had one in my experience. Do you have a use case where you want the keys in the result? If so, could you share any details on your use?
Comment From: Verena1695
Thanks for your comments. I discussed with the team and we came to the conclusion that we don't have a use case for having the groups in the index/columns of the result.