Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
If statements are a pain in Pandas. The .where
method is confusing, and most folks resort to writing a function and using .apply
to use old-school Python style conditionals.
Also, I'm not a fan of using sequences of .loc
assignment because it breaks chaining.
I've resorted to using np.select
. It would be great to have this as native functionality.
Consider calculating On-balance Volume (OBV)
Many resort to this:
# naive
def calc_obv(df):
df = df.copy()
df["OBV"] = 0.0
# Loop through the data and calculate OBV
for i in range(1, len(df)):
if df["Close"][i] > df["Close"][i - 1]:
df["OBV"][i] = df["OBV"][i - 1] + df["Volume"][i]
elif df["Close"][i] < df["Close"][i - 1]:
df["OBV"][i] = df["OBV"][i - 1] - df["Volume"][i]
else:
df["OBV"][i] = df["OBV"][i - 1]
return df
calc_obv(aapl)
Here's my attempt with .where
:
# This is painful
(aapl
.assign(close_prev=aapl.Close.shift(1),
vol=0,
obv=lambda adf: adf.vol.where(cond=adf.Close == adf.close_prev,
other=adf.Volume.where(cond=adf.Close > adf.close_prev,
other=-adf.Volume.where(cond=adf.Close < adf.close_prev, other=0)
)).cumsum()
)
)
Here's my np.select
version:
(aapl
.assign(vol=np.select([aapl.Close > aapl.Close.shift(1),
aapl.Close == aapl.Close.shift(1),
aapl.Close < aapl.Close.shift(1)],
[aapl.Volume, 0, -aapl.Volume]),
obv=lambda df_:df_.vol.cumsum(),
)
)
Feature Description
New method (adapted from NumPy):
def select(self, condlist, choicelist, default=0):
"""
Return an array drawn from elements in choicelist, depending on conditions.
Parameters
----------
condlist : list of bool ndarrays
The list of conditions which determine from which array in `choicelist`
the output elements are taken. When multiple conditions are satisfied,
the first one encountered in `condlist` is used.
choicelist : list of ndarrays
The list of arrays from which the output elements are taken. It has
to be of the same length as `condlist`.
default : scalar, optional
The element inserted in `output` when all conditions evaluate to False.
Returns
-------
output : Dataframe/Series
The output at position m is the m-th element of the array in
`choicelist` where the m-th element of the corresponding array in
`condlist` is True.
"""
return np.select(condlist, choicelist, default)
Alternative Solutions
-
np.select
- https://numpy.org/doc/stable/reference/generated/numpy.select.html -
Polars
when
- https://pola-rs.github.io/polars-book/user-guide/dsl/expressions.html#binary-functions-and-modification
Additional Context
No response
Comment From: erfannariman
Duplicate of: https://github.com/pandas-dev/pandas/issues/39154
Comment From: phofl
Thx, lets keep the discussion focused there