Pandas BUG: Rolling std() error - Nineya|java/go/python

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
data=[1,-1,0,1,3,2,-2,10000000000,1,2,0,-2,1,3,0,1]
df=pd.DataFrame(data,columns=['data'])

df.data.rolling(6).std()[-1:]
15    57.250852

df.data.tail(6).std()
1.6431676725154984

Issue Description

When there is a large outliner in the data, then rolling().std() and tail().std() come to different results.

Expected Behavior

The results must be equal

Installed Versions

INSTALLED VERSIONS ------------------ commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.9.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : Russian_Russia.1251 pandas : 1.4.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 62.6.0 pip : 22.1.2 Cython : 0.29.27 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.7.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.0.1 pandas_datareader: None bs4 : 4.10.0 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None markupsafe : 2.0.1 matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.7.3 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None zstandard : None

Comment From: GYHHAHA

This current momentum rolling method is not able to get high precision when data ranges abnormally. The same to #47461.

Comment From: sappersapper

maybe an ad hoc for rolling().std() when data ranges abnormally is rolling.agg(lambda x: np.std(x, ddof=1)):

df.rolling(6).agg(lambda x: np.std(x, ddof=1))[-1:]

15  1.643168

Comment From: mahmoudmarayef

The reason why the rolling().std() and tail().std()methods in pandas are returning different results because they are using different window sizes to calculate the standard deviation.

In the case of df.data.rolling(6).std()[-1:], a rolling window of size 6 is used to calculate the standard deviation, which includes the outlier value of 10 billion. This causes the standard deviation to be much larger than when the outlier is excluded.

On the other hand, df.data.tail(6).std() only considers the last 6 values in the DataFrame, excluding the outlier value. This results in a smaller standard deviation compared to when the outlier is included.

To ensure that the results are equal, you can use the same window size for both methods by using df.data.tail(6).rolling(6).std()[-1:]. This will calculate the standard deviation for the last 6 values of the data frame using a rolling window of size 6, which includes the outlier value.

Alternatively, you can remove the outlier value from the DataFrame before calculating the standard deviation using the drop method, like so:

df_without_outlier = df.drop(df.loc[df['data'] == 10000000000].index)
df_without_outlier.rolling(6).std()[-1:]

This will remove the outlier value from the DataFrame and then calculate the standard deviation using a rolling window of size 6, resulting in a smaller standard deviation value that is more in line with the tail().std() method.