Pandas Numerical issue with rolling cov and corr

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
a = pd.Series([1e5, 0, 0, 0, 0])
b = pd.Series([9.45] * 5)
c1 = a.rolling(5).corr(b).iloc[4]
c2 = a.corr(b)
v1 = a.rolling(5).cov(b).iloc[4]
v2 = a.cov(b)
assert c1 == c2
assert v1 == v2

Problem description

I came across a strange behavior of Pandas rolling correlation. In the code snippet below, I'd assume v1 == v2 is true but it turns out not. This causes inf in rolling correlation (c1 vs. c2, where c2 is fine but c1 is "wrong" in my opinion). Since the standard deviation of a constant sequence is 0, the correlation between it and any other sequence would be a 0/0. Returning a nan as what the vanilla corr does is fine, but returning inf is annoying and misleading

Expected Output

assertions pass

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 pandas: 0.23.4 pytest: 4.0.0 pip: 18.1 setuptools: 40.6.2 Cython: 0.29 numpy: 1.15.4 scipy: 1.1.0 pyarrow: None xarray: None IPython: 7.1.1 sphinx: 1.8.2 patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: None numexpr: 2.6.8 feather: None matplotlib: None openpyxl: 2.5.9 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.1.2 lxml: 4.2.5 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.14 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: navicor90

Hi, I was reading the implementation of the cov function and at the end is: (mean(X * Y) - mean(X) * mean(Y)) where X=a and Y=b in your situation. You need to know that in cov function a and b series are casted as type float64. And as the flaoting point python3 implementation has some limitations, the subtraction between mean(XY) - mean(X)mean(Y) is not exactly zero.

I thought that this other implementation avoid this kind of situations: (summarize(X * Y) - (n * mean(X) * mean(Y))) * bias_adj I tested it and it returned zero.

Comment From: jreback

yep would take a patch for this

Comment From: navicor90

I was trying to implement this, but I found many other situations where (summarize(X * Y) - (n * mean(X) * mean(Y))) * bias_adj still not return exactly the expected value.

Why this happen? Because we still have fractions (mean(X) and mean(Y)) I looked for another ways to do the math, but always you need to preserve at least one fraction. So eventually the imprecision happens.

As workaround, the user could use the round,floor or ceil functions.

Pandas Numerical issue with rolling cov and corr

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`