Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame({"somegroup": ["A", "B", None], "someint": [1, 2, 3]})
df['someint'] = df['someint'].astype('Int64')
df['cumsum'] = df.groupby(['somegroup'])['someint'].transform('cumsum')
print(df.dtypes)
print(df.query("cumsum.isnull()"))
somegroup object
someint Int64
cumsum Float64
dtype: object
somegroup someint cumsum
```
Issue Description
I noticed that when a GroupBy column contains NaNs and it is a Int64, the resulting transform operation produces a Float64, which then cannot be checked for NaNs.
If I do the same operation with the regular int64 and float64 the isnull check works.
Expected Behavior
df = pd.DataFrame({"somegroup": ["A", "B", None], "someint": [1, 2, 3]}) df['someint'] = df['someint'].astype(int) df['cumsum'] = df.groupby(['somegroup'])['someint'].transform('cumsum')
print(df.dtypes) print(df.query("cumsum.isnull()"))
somegroup object someint int64 cumsum float64 dtype: object
somegroup someint cumsum 2 None 3 NaN
Installed Versions
Comment From: abudis
Actually I think this is already flagged as part of this discussion - https://github.com/pandas-dev/pandas/issues/32265.
Closing.