The error occurs with

df = pd.DataFrame({"A": [2, 1, 1, 1, 2, 2, 1], "B": pd.Series(np.full(7, np.nan), dtype="Int64")})
df.groupby("A").std()

while now we have df.groupby("A").var() working correctly.

  • [ x] I have checked that this issue has not already been reported.

  • [x ] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

Output of pd.show_versions()

[paste the output of ``pd.show_versions()`` here leaving a blank line after the details tag]

Comment From: arw2019

This works on current master

Comment From: jreback

this is more complicated as its currently wrong, we should be outputing and ignoring in the calculation.

In [21]: df = pd.DataFrame(
    ...:         {"A": [2, 1, 1, 1, 2, 2, 1], "B": pd.Series(np.full(7, np.nan), dtype="Int64"), 'C': pd.array([1,2,3, 4,pd.NA, pd.NA, pd.NA], dtype='Int64')}
    ...:
    ...:         )

In [22]: df.groupby('A').std()
Out[22]:
    B    C
A
1 NaN  1.0
2 NaN  NaN

In [23]: df.groupby('A').sum()
Out[23]:
   B  C
A
1  0  9
2  0  1

In [24]: df.groupby('A').mean()
Out[24]:
      B    C
A
1  <NA>  3.0
2  <NA>  1.0

Comment From: GabrielSimonetto

@jreback What would be the desired behaviour? Is the problem only in the std function? Sorry if the question is a bit dumb, but I don't understand what your example is showing

Comment From: jreback

we need to actually handle pd.NA correctly and return it as appropriate

here the results should all contain of pd.NA

this is pretty new and that's why it has not been built out yet

groupby supports EA for mean sum min max but not yet std / var for EAs generally

and that's the issue