Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# Some index that are not integers
index = pd.date_range(start='2000-01-01', periods=3, freq='YS')
# Integer as a name
data = pd.Series([1, 2, 3], index=index, name=2)
data.groupby(data) # FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
Issue Description
The warning comes from this line: https://github.com/pandas-dev/pandas/blob/0691c5cf90477d3503834d983f69350f250a6ff7/pandas/core/groupby/grouper.py#L1015
Here, gpr.name
is 2
, which results in the warning. If the index consists of integers, this means obj[gpr.name]
will actually return the line (which then fails the comparison).
I have not checked on the main branch, but the same line might be present here: https://github.com/pandas-dev/pandas/blob/d72f165eb327898b1597efe75ff8b54032c3ae7b/pandas/core/groupby/grouper.py#L853
Expected Behavior
Don't issue FutureWarning
Installed Versions
Comment From: rhshadrach
Thanks for the report. It appears to me we should only be doing this check when obj
is a DataFrame. PRs to fix are welcome. I think this can go in 2.3.
Comment From: sanggon6107
take
Comment From: sanggon6107
Hi @rhshadrach,
I've found that Series.groupby
doesn't show the warning on the main branch(70edaa0
) since Series.__getitem__
currently always treats the integer key as a label as discussed at #50617.
Do we still need to do the check only when obj
is a DataFrame
? I personally think it might make sense though since Series
doesn't have columns.
Comment From: rhshadrach
I've found that
Series.groupby
doesn't show the warning on the main branch
This is because that deprecation has been enforced on the main branch for the 3.0 release. However we will be releasing 2.3 prior to this, and the warning still shows on the 2.3.x branch. This is where the fix should go.
Comment From: Anurag-Varma
This can be closed right, as the pr is merged
Comment From: TiborVoelcker
I originally thought that the implementation on main might be even more dangerous:
If grouping a pd.Series
, and if gpr.name
happens to be in the index of the series, one could get some really weird results. The assignment in https://github.com/pandas-dev/pandas/blob/d72f165eb327898b1597efe75ff8b54032c3ae7b/pandas/core/groupby/grouper.py#L853 would then be valid.
I have experimented a bit to get a bug (by making a series containing series, etc.), but I could never get https://github.com/pandas-dev/pandas/blob/d72f165eb327898b1597efe75ff8b54032c3ae7b/pandas/core/groupby/grouper.py#L857 to return True
. Therefore, it seems to be fine. I was intentionally trying to break it anyway, and it would probably never happen naturally.
But the point I was trying to make might still be valid, just from a "cleaner code" standpoint: I believe an assumption is made in line 853 (that the obj
is a pd.DataFrame
), without checking it. So I just thought that this should be added, maybe even just by using obj_gpr_column = obj.loc[:, gpr.name]
.
I would gladly add this quick check, if wanted. But as I could not trigger a bug, I don't longer think this is strictly necessary.
So in short: Yes, I believe this issue can be closed. Thanks @sanggon6107 for the fast fix!
Comment From: rhshadrach
Thanks @Anurag-Varma - agreed. Closing.