Pandas No way to access correlation significance value (Spearman, Kendall)

The corr() method attached to DataFrames is a great way to get a matrix of correlation coefficients. But for the Kendall and Spearman options, the p-value is discarded:

https://github.com/pydata/pandas/blob/8ac0e11b59b65f0ac898dca2beebde5f87836649/pandas/core/nanops.py#L510-L517

For a function called corr() this would certainly be expected behavior, but the problem remains that there is no other method (as far as I've been able to find) to get those significance values without going back to scipy and doing it myself.

Comment From: cpcloud

significance testing is out of pandas domain, though i understand your sentiment.

check out statsmodels

personally, i would argue that corrcoef from numpy is inconsistent with the others from scipy.stats or the other way around....but they both should do the same thing: either they all should return a p-value or not.

pandas has chosen the latter so returning a p-value would be a backwards incompatible API change...

Comment From: smdabdoub

significance testing is out of pandas domain

That is understandable, in this case however it's just so tantalizingly close, and I'm not advocating for changing the behavior of corr, but perhaps a parallel method...although that could be a slippery slope to polluting the namespace.

Comment From: jreback

you could have a keyword return_p_values=False I guess (though not usually in favor of these types of keywords; the problem is the user of the function HAS to know to except a tuple (or not)

Comment From: smdabdoub

A keyword solution would be somewhat ugly.

the user of the function HAS to know to expect a tuple (or not)

With a default False, current behavior wouldn't change, so if a user is changing that value, they should be reading the documentation (I would assume)

Comment From: jtratner

Is there a performance difference with statsmodels? Given that it works with pandas objects, seems like it would be better to not duplicate it. (and it isn't hard to install, right?)

Comment From: smdabdoub

statsmodels doesn't provide any means for calculating correlations

Comment From: josef-pkt

just a general comment: Very few functions of numpy or scipy.stats are duplicated in statsmodels. (There is covariance and correlation for data with frequency weights. There are some new correlation tests coming soon.) However, if it is useful then it would be possible to add wrapper functions for scipy.stats in statsmodels, that could use both pandas and scipy.stats. (I'm not sure about proper nan handling, besides dropna, if we don't want to duplicate parts of nanops and of scipy.stats.mstats, or just call scipy.stats.mstats if necessary.)

Comment From: jreback

@ojiisan If you can think of a nice API for this would be great...moving to 0.14

Comment From: smdabdoub

@jreback Certainly, I'll work on that

Comment From: dfrusdn

Added it as an issue for numpy to enhance corrcoef and cov

Both Kendall and Spearman calculations were also found in scipy.stats

https://github.com/numpy/numpy/issues/4147

Comment From: jorisvandenbossche

Closing this for now as out of scope for pandas. We can always reopen the discussion if someone is really interested in this and has a proposal for specific API.