Pandas ddof for np.std in df.agg changes depending on how given & lambda expression does not work correctly in a list of functions

Code Sample, a copy-pastable example if possible

In [31]: import numpy as np

In [32]: import pandas as pd

In [33]: df = pd.DataFrame(np.arange(6).reshape(3, 2), columns=['A', 'B'])

In [34]: df
Out[34]:
   A  B
0  0  1
1  2  3
2  4  5

In [35]: df.agg(np.std)  # Behavior of ddof=0
Out[35]:
A    1.632993
B    1.632993
dtype: float64

In [36]: df.agg([np.std])  # Behavior of ddof=1
Out[36]:
       A    B
std  2.0  2.0

In [37]: # So how to get the ddof=0 behavior when giving a list of functions?

In [39]: df.agg([lambda x: np.std(x)])  # This gives a numerically unexpected result.
Out[39]:
         A        B
  <lambda> <lambda>
0      0.0      0.0
1      0.0      0.0
2      0.0      0.0

In [40]: df.agg([np.mean, lambda x: np.std(x)])  # This gives an error.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-52f4ec4195b5> in <module>()
----> 1 df.agg([np.mean, lambda x: np.std(x)])

/Users/ikeda/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
   4740         if axis == 0:
   4741             try:
-> 4742                 result, how = self._aggregate(func, axis=0, *args, **kwargs)
   4743             except TypeError:
   4744                 pass

/Users/ikeda/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
    537             return self._aggregate_multiple_funcs(arg,
    538                                                   _level=_level,
--> 539                                                   _axis=_axis), None
    540         else:
    541             result = None

/Users/ikeda/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
    594         # if we are empty
    595         if not len(results):
--> 596             raise ValueError("no results")
    597
    598         try:

ValueError: no results

Problem description

When using, e.g., df.agg, the ddof (degrees of freedom) value for the function np.std changes depending on how the function is given (single function or a list of functions), which may be so confusing for many people. I believe the behavior should be unified in some way.

Furthermore, I could not find the way to obtain to the np.std result with ddof=0 by supplying it as one of the members of a list of functions. The lambda expression does not work well in a list of functions (this gives numerically unexpected results or even gives errors). This prohibits us to use many useful methods like df.agg, df.apply, and df.describe when we hope the ddof=0 behavior.

From https://github.com/pandas-dev/pandas/issues/13344, I guess Developers prefer the ddof=1 behavior in pandas. So the expected behavior should be as below.

Expected Output

In [35]: df.agg(np.std)  # Behavior of ddof=1
Out[35]:
A    2.0
B    2.0
dtype: float64

In [38]: df.agg([lambda x: np.std(x)])  # To obtain the ddof=0 results
Out[38]:
                     A             B
<lambda>      1.632993      1.632993

In [41]: df.agg([np.mean, lambda x: np.std(x)])
                     A             B
mean          2.0           3.0
<lambda>      1.632993      1.632993

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None pandas: 0.21.0 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.13.3 scipy: 0.19.0 pyarrow: None xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: mroeschke

This looks to work on master. Could use a test

In [15]: In [35]: df.agg(np.std)
Out[15]:
A    2.0
B    2.0
dtype: float64

Comment From: jcrvz

It works by doing: df.agg([lambda x: np.mean(x), lambda x: np.std(x, ddof=0)])

Comment From: sappersapper

I encounter a related inconsistence:

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': ['a', 'a', 'b', 'b'], 'value': [1, 3, 5, 7]})

df.groupby('id').agg(np.var)  # calc with ddof=1

id	value
a	2
b	2

df.groupby('id').apply(np.var)  # calc with ddof=0

id	value
a	1
b	1

Pandas ddof for np.std in df.agg changes depending on how given & lambda expression does not work correctly in a list of functions

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`