Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
data =[-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,7729964027,7729965641,7729965103,7729966179,7729968330,7729969406,7729971020,7729969406,7729969944,7729969406,7729969406,7729969944,7729971558,7729972633,7729973171,7729973171,7729973171,7729969406]
df=pd.DataFrame(data,columns = ['data'])
df.data.rolling(10).skew()
43 -1.778781e+00
44 -3.162278e+00
45 -1.311680e+03
46 2.681674e+04
47 5.244346e+03
48 5.565501e+03
49 4.143785e+04
50 1.921654e+04
51 -7.723187e+03
52 1.171920e+04
53 1.171920e+04
df.data.tail(10).skew()
0.16889762763904356
#Same problem for kurt()!
df.data.rolling(10).kurt()[-1:]
53 -4.518855e+10
df.data.tail(10).kurt()
-2.198110595961745
Issue Description
When there is a large spread in the exponent of values in the data, then rolling().skew() and tail().skew() come to different results.
Expected Behavior
The results must be equal.
Installed Versions
Comment From: phofl
Hi, thanks for your report.
Probably an overflow. Could you trim your example to use only as many values as necessary? This looks a bit large for a minimal example
Comment From: alex7088
I reduced the data in the example to 54 elements, and also found that the same error with kurt()
Comment From: GYHHAHA
A refined example:
>>> s = pd.Series([-2., 200000., 200001., 200005.])
>>> s.tail(3).skew()
1.457862967321305
>>> s.rolling(3).skew()
0 NaN
1 NaN
2 -1.732051
3 2.071599
dtype: float64
When the last three numbers grow, the last element in the rolling result deviates from the correct skew value.
Comment From: GYHHAHA
The problem is caused by double precision (instead of overflow). I have fixed the refined example by using long double
C type and implementing multiplication instead of division when computing the momentums in the following lines.
# original
A = x / dnobs
B = xx / dnobs - A * A
C = xxx / dnobs - A * A * A - 3 * A * B
# new
B = (xx * dnobs - x * x) / dnobs / dnobs
C = (xxx * dnobs * dnobs - 3. * x * xx * dnobs + 2. * x * x * x) / dnobs / dnobs / dnobs
But in @alex7088 example, it is still not accurate, though much better numerically. Thus I'm not sure whether I should PR. @phofl