Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
df1 = pd.DataFrame(
{"A": [1, np.nan, 7, 8], "B": [False, True, True, False], "C": [10, 4, 9, 3]}
)
df2 = df1[["B", "C"]]
(df1 + 1).corrwith(df2.B, method="spearman")
Issue Description
Seems like the behavior of pandas.DataFrame.corrwith
is changed from pandas 1.5.0:
The result from the reproducible example in pandas 1.4.0:
>>> (df1 + 1).corrwith(df2.B, method="spearman")
A 0.0
B 1.0
C 0.0
dtype: float64
However, the result from the reproducible example in pandas 1.5.0:
>>> (df1 + 1).corrwith(df2.B, method="spearman")
A 0.5
B 1.0
C -0.2
dtype: float64
I see the https://github.com/pandas-dev/pandas/pull/46174 that improve the performance, but maybe the behavior also changed?
Expected Behavior
The expected behavior in the previous pandas was:
>>> (df1 + 1).corrwith(df2.B, method="spearman")
A 0.0
B 1.0
C 0.0
dtype: float64
Installed Versions
Comment From: MarcoGorelli
Hey, thanks for the report
I see the https://github.com/pandas-dev/pandas/pull/46174 that improve the performance, but maybe the behavior also changed?
Yup, git bisect confirms
Comment From: MarcoGorelli
cc @fractionalhare (no blame, just in case you know how to fix it)
Comment From: fractionalhare
Thanks for flagging. I can look later today after work. This is strange, my guess is an edge case not covered by tests before. I think it involves the NaN not being ranked by argsort for the Spearman calculation. Will confirm later.
Comment From: phofl
Reverting would be an option, if we can not figure out what caused this
Comment From: MarcoGorelli
Hi @itholic
Thanks for having reported this - there's now a release candidate for pandas 2.0.0, which you can install with
mamba install -c conda-forge/label/pandas_rc pandas==2.0.0rc0
or
python -m pip install --upgrade --pre pandas==2.0.0rc0
If you try it out and report bugs, then we can fix them before the final 2.0.0 release