For my own purposes, I am considering writing classes to represent PMFs and CDFs using Pandas Series as the implementation, and an API similar to what I did in the thinkstats2 library (but made more stylistically consistent with Pandas).
Has there been any discussion of adding something like this to Pandas? If I develop it, would you be interested in seeing a proposal to include it? (I ask now because it might influence some design decisions if I am targeting inclusion in Pandas).
Comment From: TomAugspurger
I suspect it would be out of scope for pandas, since it's pretty stats-specific.
(aside: statsmodels has an empirical CDF implementation at http://www.statsmodels.org/dev/generated/statsmodels.distributions.empirical_distribution.ECDF.html. Not sure if that suites your needs or not).
Comment From: AllenDowney
Thanks for the quick reply, Tom. Understood.
The ECDF in StatsModels is along the lines of what I have in mind, but there are a few more methods I'd like to provide (like random sampling).
Comment From: chris-b1
see #14781, idea from xarray here - this feels like a usecase that might be well suited to that accessor idea instead of subclassing. Idea would be to be able to write something like
@pd.register_series_accessor('stat')
class StatAccessor:
def custom_function1(...)
pass
s = pd.Series([1,2,3,4])
s.stat.custom_function1(...)
I do agree with @TomAugspurger that this is probably out of scope for pandas to be built in
Comment From: jreback
closing. as indicated this is out-of-scope. We have removed all non-basic stats functionaility recently. This should be done in statsmodels
or a pandas plug-in package (just came up with that!)