Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this issue exists on the latest version of pandas.
-
[X] I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
Pandas v1.4.2, Numpy v1.22.3
# Masking by a vector
v = np.random.rand(100000)
# Need to reset `pd.Series` after each run
%timeit x = pd.Series(v, copy=True); x[x > 0.5] = 2 * x
%timeit x = pd.Series(v, copy=True); x.loc[x > 0.5] = 2 * x
1.18 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
3.21 ms ± 42.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Usually, x.loc[...]
is faster than x[...]
when subsetting or masking by a scalar. So I wonder if there is a miss fast-path implementation here?
Installed Versions
Prior Performance
No response
Comment From: MarcoGorelli
Thanks for the report - seems like quite a bit of time is spent in
https://github.com/pandas-dev/pandas/blob/ddf2541df866e89150210d41c22e45eb2cf83e91/pandas/core/indexes/range.py#L399-L427
which can probably be improved
Comment From: MarcoGorelli
Had a look at this, and it's hard to get a consistent improvement without affecting some other benchmarks
Do you have a use-case where this difference makes a noticeable difference? If not, I'd be tempted to close, not sure it's worth optimising what's already only taking a few milliseconds
Comment From: phofl
This is consistent on main, so closing