Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

If statements are a pain in Pandas. The .where method is confusing, and most folks resort to writing a function and using .apply to use old-school Python style conditionals.

Also, I'm not a fan of using sequences of .loc assignment because it breaks chaining.

I've resorted to using np.select. It would be great to have this as native functionality.

Consider calculating On-balance Volume (OBV) Screen Shot 2023-01-31 at 11 41 00 AM

Many resort to this:

# naive

def calc_obv(df):
    df = df.copy()
    df["OBV"] = 0.0

    # Loop through the data and calculate OBV
    for i in range(1, len(df)):
        if df["Close"][i] > df["Close"][i - 1]:
            df["OBV"][i] = df["OBV"][i - 1] + df["Volume"][i]
        elif df["Close"][i] < df["Close"][i - 1]:
            df["OBV"][i] = df["OBV"][i - 1] - df["Volume"][i]
        else:
            df["OBV"][i] = df["OBV"][i - 1]  
    return df

calc_obv(aapl)

Here's my attempt with .where:

# This is painful
(aapl
 .assign(close_prev=aapl.Close.shift(1),
         vol=0,
         obv=lambda adf: adf.vol.where(cond=adf.Close == adf.close_prev, 
                                       other=adf.Volume.where(cond=adf.Close > adf.close_prev, 
                                                  other=-adf.Volume.where(cond=adf.Close < adf.close_prev, other=0)
                                        )).cumsum()
        )
)

Here's my np.select version:

(aapl
 .assign(vol=np.select([aapl.Close > aapl.Close.shift(1), 
                        aapl.Close == aapl.Close.shift(1), 
                        aapl.Close < aapl.Close.shift(1)],
                       [aapl.Volume, 0, -aapl.Volume]),
         obv=lambda df_:df_.vol.cumsum(),
        )
)

Feature Description

New method (adapted from NumPy):

def select(self, condlist, choicelist, default=0):
    """
    Return an array drawn from elements in choicelist, depending on conditions.

    Parameters
    ----------
    condlist : list of bool ndarrays
        The list of conditions which determine from which array in `choicelist`
        the output elements are taken. When multiple conditions are satisfied,
        the first one encountered in `condlist` is used.
    choicelist : list of ndarrays
        The list of arrays from which the output elements are taken. It has
        to be of the same length as `condlist`.
    default : scalar, optional
        The element inserted in `output` when all conditions evaluate to False.

    Returns
    -------
    output : Dataframe/Series
        The output at position m is the m-th element of the array in
        `choicelist` where the m-th element of the corresponding array in
        `condlist` is True.
    """
    return np.select(condlist, choicelist, default)

Alternative Solutions

  • np.select - https://numpy.org/doc/stable/reference/generated/numpy.select.html

  • Polars when - https://pola-rs.github.io/polars-book/user-guide/dsl/expressions.html#binary-functions-and-modification

Additional Context

No response

Comment From: erfannariman

Duplicate of: https://github.com/pandas-dev/pandas/issues/39154

Comment From: phofl

Thx, lets keep the discussion focused there