The error occurs with
df = pd.DataFrame({"A": [2, 1, 1, 1, 2, 2, 1], "B": pd.Series(np.full(7, np.nan), dtype="Int64")})
df.groupby("A").std()
while now we have df.groupby("A").var()
working correctly.
-
[ x] I have checked that this issue has not already been reported.
-
[x ] I have confirmed this bug exists on the latest version of pandas.
-
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code here
Problem description
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
Output of pd.show_versions()
Comment From: arw2019
This works on current master
Comment From: jreback
this is more complicated as its currently wrong, we should be outputing
In [21]: df = pd.DataFrame(
...: {"A": [2, 1, 1, 1, 2, 2, 1], "B": pd.Series(np.full(7, np.nan), dtype="Int64"), 'C': pd.array([1,2,3, 4,pd.NA, pd.NA, pd.NA], dtype='Int64')}
...:
...: )
In [22]: df.groupby('A').std()
Out[22]:
B C
A
1 NaN 1.0
2 NaN NaN
In [23]: df.groupby('A').sum()
Out[23]:
B C
A
1 0 9
2 0 1
In [24]: df.groupby('A').mean()
Out[24]:
B C
A
1 <NA> 3.0
2 <NA> 1.0
Comment From: GabrielSimonetto
@jreback What would be the desired behaviour? Is the problem only in the std function? Sorry if the question is a bit dumb, but I don't understand what your example is showing
Comment From: jreback
we need to actually handle pd.NA correctly and return it as appropriate
here the results should all contain of pd.NA
this is pretty new and that's why it has not been built out yet
groupby supports EA for mean sum min max but not yet std / var for EAs generally
and that's the issue