Feature Type
-
[ ] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
pandas.core.groupby.SeriesGroupBy.apply
and pandas.core.groupby.DataFrameGroupBy.apply
need a callable that takes a series as its first argument and is usually quite slow.
While other apply
functions such as pandas.core.window.rolling.Rolling.apply
or pandas.DataFrame.apply
have raw
argument which allow pass ndarray object to the callable and thus make apply
faster.
Feature Description
Make the function just like this:
def apply(func, raw=False, engine=None, engine_kwargs=None, args=None, kwargs=None):
...
Alternative Solutions
No
Additional Context
No response
Comment From: topper-123
I think it sounds reasonable and I'd like that the API is as similar as possible across the apply methods. Performance examples showing improvements will be needed too to show this is woth adding.
Comment From: PaleNeutron
A simple performance test on raw mode:
import pandas as pd
import numpy as np
s = pd.Series(np.random.rand(10000))
%%timeit
s.rolling(60).apply(lambda s: np.all(s[-1] >= s), raw=True) # with raw
34.1 ms ± 1.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
s.rolling(60).apply(lambda s: (s.iloc[-1] >= s).all()) # without raw
1.03 s ± 8.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Comment From: topper-123
Ok, I think this would be a good idea to include then. I think this should be the same for all apply methods though, so also Series.apply
, DataFrameGroupby.apply
(there may be some implementation differences between those methods, so achieving that may not be as simple as it sounds).
Comment From: rhshadrach
For DataFrame.apply, currently raw=True
isn't fully supported for EAs (e.g. masked arrays with a null value). If we were to move to these as the default (which is I think the direction we're heading), it's not clear to me what would happen to the raw argument. Perhaps this is still worth it to add for NumPy dtypes in the meantime?