Pandas BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt()

Note that for a constant series, Series.skew() returns 0, while rolling/expanding_skew() return NaN.

In [532]: s = Series([1]*4)

In [533]: s.skew()
Out[533]: 0

In [534]: expanding_skew(s)
Out[534]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [535]: rolling_skew(s, 4)
Out[535]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

While rolling/expanding_kurt() similarly return NaN for a constant series, Series.kurt() produces an error.

In [541]: expanding_kurt(s)
Out[541]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [542]: rolling_kurt(s, 4)
Out[542]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [543]: s.kurt()
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-543-e41f2ec435bd> in <module>()
----> 1 s.kurt()

C:\Python34\lib\site-packages\pandas\core\generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
   3783                                               skipna=skipna)
   3784                 return self._reduce(f, axis=axis,
-> 3785                                     skipna=skipna, numeric_only=numeric_only)
   3786             stat_func.__name__ = name
   3787             return stat_func

C:\Python34\lib\site-packages\pandas\core\series.py in _reduce(self, op, axis, skipna, numeric_only, filter_type, **kwds)
   2007                 filter_type=None, **kwds):
   2008         """ perform a reduction operation """
-> 2009         return op(_values_from_object(self), skipna=skipna, **kwds)
   2010
   2011     def _reindex_indexer(self, new_index, indexer, copy):

C:\Python34\lib\site-packages\pandas\core\nanops.py in _f(*args, **kwargs)
     46                                 'this dtype'.format(f.__name__.replace('nan',
     47                                                                        '')))
---> 48             return f(*args, **kwargs)
     49         return _f
     50

C:\Python34\lib\site-packages\pandas\core\nanops.py in nankurt(values, axis, skipna)
    504     D = _zero_out_fperr(D)
    505
--> 506     result = (((count * count - 1.) * D / (B * B) - 3 * ((count - 1.) ** 2)) /
    507               ((count - 2.) * (count - 3.)))
    508     if isinstance(result, np.ndarray):

ZeroDivisionError: float division by zero

Comment From: seth-p

My 2c: there really shouldn't be two different implementations of skew/kurtosis functions.

Comment From: jreback

@seth-p maybe edit the top section with a code-reference so that when this is addressed the section can be fixed (as you have commented for #7926)

Comment From: seth-p

@jreback, afraid I'm not sure I follow your last comment. Do you mean the following?

As a test for a fix, un-comment the following lines in test_moments.py:

        #(mom.expanding_skew, lambda v: Series(v).skew(), 3), # restore once GH 8086 is fixed
        #(mom.expanding_kurt, lambda v: Series(v).kurt(), 4), # restore once GH 8086 is fixed

        #(mom.rolling_skew, lambda v: Series(v).skew(), 3), # restore once GH 8086 is fixed
        #(mom.rolling_kurt, lambda v: Series(v).kurt(), 4), # restore once GH 8086 is fixed

Comment From: jreback

@seth-p I realize now why you commented these out

Comment From: jaimefrio

The solution to this is to modify the behavior of Series.skew() and Series.kurt() so that they return np.nan instead of 0, right?

Comment From: seth-p

I think so, assuming they are supposed to return unbiased estimates. Though to be honest I haven't examined them closely beyond observing the inconsistencies noted above.

Comment From: jaimefrio

I was looking into this to remove the commented tests for the rolling versions. There are a bunch of tests making sure that the return value is 0, and the code itself explicitly sets things to 0. So even if it is a bug, it was clearly considered a feature when the code was written. Not sure whether changing the behavior can be happily done without breaking someone's code...

Comment From: jreback

relatd to this: https://github.com/pydata/pandas/pull/7928

I wouldn't return nan unless you have all missing values or its empty. 0 is the calculated value no?

Comment From: seth-p

I think, ideally, the commented-out consistency tests (https://github.com/pydata/pandas/issues/8086#issuecomment-55687801) should be there. Obviously some of the cases are 0 / NaN, but others aren't.

Separately, I think should add to _test_moments_consistency() additional parameters skew_unbiased=None, kurt_unbiased=None and skew_biased=None, kurt_biased=None, and consistency checks between them and the corresponding lower-order moments (biased or unbiased, as appropriate), i.e. comparable to this test for biased variance estimates:

                if var is var_biased:
                    # check that biased var(x) == mean(x^2) - mean(x)^2
                    mean_x2 = mean(x * x)
                    assert_equal(var_x, mean_x2 - (mean_x * mean_x))

Comment From: seth-p

Regarding 0 vs NaN, even if for now we are keeping the disparate behavior ofrolling/expanding_kurt/skew() and Series.skew/kurt(), I would leave those commented-out tests there (still commented out), as an aspirational goal...

Comment From: seth-p

@jreback, if need to divide by a denominator that is 0, then I think should return NaN (analogous to calculating an unbiased standard deviation where need to divide by N-1 when N=1).

Comment From: jreback

@seth-p yep that too! I believe var/std do that as well shouldn't be filled by 0 by default (though is ATM)

Comment From: seth-p

@jreback, no, at least in master, calculating unbiased (ddof=1, the default) var/std will produce NaN for a single non-NaN value (since it wants to divide by N-1 = 0). This should be the case for rolling/expanding_var(), ewmvar() (with bias=False, the default), and Series.var(). If that's not the case, and any of these produces 0 for a single non-NaN value, then I missed something in my tests...

Comment From: jreback

@seth-p what I mean is that var/std DO provide a nan when dividing by 0. These should all be consistent (hence this issue).

Comment From: seth-p

Ah, sorry, I misunderstood you. Yes, all the var/std functions do produce a NaN when dividing by 0...

Comment From: jreback

as far as the tests go, yes I think will have to change those tests and test for nan in those edge cases.

Comment From: TomAugspurger

In [133]: s.rolling(4).kurt()
Out[133]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [134]: s.kurt()
Out[134]: 0