Pandas ENH: A simple value filter that allows easy chaining

Feature Type

[X] Adding new functionality to pandas
[ ] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas

Problem Description

I wish I could have a nicer mypy/flake8/pylint friendly way of doing super simple data filtering.

Right now if I want to do a simple filter, I'd normally do it like so:

val = "foo"
df = (
    df_raw.set_index("A")
    .sort_index()
    .unstack()
    .groupby("B")
    .mean()
)
df = df[df["C"] == val]
df = df.join(df2).reset_index()

Which frustrates my compulsive brain, because I'd love for this to be all chainable. If I use query instead of loc, that's, fine, by my linters will complain that val is not used. I can add comments to make the linters ignore this, but that feels unpythonic.

An alternative method would be to

val = "foo"
df = (
    df_raw.set_index("A")
    .sort_index()
    .unstack()
    .groupby("B")
    .mean()
    .loc[lambda x: x[x["C"] == val]]
    .join(df2)
    .reset_index()
)

However once more I feel like the syntax here could be shortened given how common a task this is.

Feature Description

Add a new method to dataframes, which allows for the simplest filtering, with loc lambdas and query still the preferred methods for complex filters:

val = "foo"
df = (
    df_raw.set_index("A")
    .sort_index()
    .unstack()
    .groupby("B")
    .mean()
    .select("C", val)
    .join(df2)
    .reset_index()
)

In terms of implementation, I'm happy to open a PR for this and write some tests. For a simple as possible proposal, something akin to the pseudocode below to add to the DataFrame class.

def select(self, col, val):
    if isinstance(val, (tuple, list)):
        # simplified, but essentially if we can pass to isin, do so
        return self[self[col].isin(val)]
    else:
        return self[self[col].eq(val)]
````

### Alternative Solutions

For similar methods, `df.xs` serves a similar purpose to what this is requesting, however it operates only on the index. Maybe we could follow its implementation or extend it, instead of adding a new method here?

### Additional Context

For other discussion on this topic, please see: https://stackoverflow.com/questions/11869910/pandas-filter-rows-of-dataframe-with-operator-chaining



**Comment From: samukweku**

Kindly provide sample reproducible data with Expected output dataframe, so that your question can be better understood

**Comment From: Samreay**

Sure thing. I've just copied the code above into a notebook and run it with a testing df, so unsure if this adds much information. Happy to elaborate on any point. But to restate, a single, chainable filter which doesn't need lambda functions or code-as-a-string would be great. 

Something like so:

<img width="611" alt="image" src="https://user-images.githubusercontent.com/6748505/200543048-58db4054-181d-49ef-96fe-aae501b82699.png">

```python
import pandas as pd

df = pd.util.testing.makeMixedDataFrame()
def select(self, col, val):
    if isinstance(val, (list, tuple, pd.Series, dict)):
        return self[self[col].isin(val)]
    else:
        return self[self[col].eq(val)]
pd.DataFrame.select = select

df.select("C", "foo2") # gives just the rows where column "C" is equal to "foo2"
df.select("C", ["foo1", "foo2"]) # gives the rows where column "C" is either "foo1" or "foo2"

Comment From: MarcoGorelli

If I use query instead of loc, that's, fine, by my linters will complain that val is not used

you can use pipe

Comment From: Samreay

Indeed you can, and in terms of all the different methods, I wrote this up (outside of this ticket):

# Download some data
repo = "https://github.com/PacktPublishing"
project = "Pandas-Cookbook/raw/master/data/flights.csv"
df_raw = pd.read_csv(f"{repo}/{project}")
df_raw = df_raw[["MONTH", "ORG_AIR", "DEST_AIR", "CANCELLED"]]

# Patch a custom method in
def select(df, col, val):
    return df[df[col] == val]

pd.DataFrame.select = select

# There are a few available methods:
df = (
    # You can use query
    df_raw.query("DEST_AIR == 'LAX'")
    # A custom patched method
    .select("MONTH", 1)
    # The same method via pipe
    .pipe(select, "ORG_AIR", "SFO")
    # Or pass a function to loc
    .loc[lambda x: x["CANCELLED"] == 0]
)

pipe with a custom function of course works just fine, but I still feel like this sort of basic filtering would be used by many pandas users and warrants potential inclusion into the provided method set.

In terms of which solution is easiest to read and more pythonic, I feel like having that provided method is the best choice.

Comment From: giordafrancis

Looks good I admit but overall I don't think it's necessary. I think you will soon run into complex syntax depending on the series type and accessor needed and will potentially end up with something similar to a query method.
query, pipe and trusty lambda in loc resolve it for me.

Comment From: Samreay

Agreed in that this wasn't meant to be (and shouldn't be) another kitchen sink alternative for query or a Callable loc. Instead, just meant to be a small utility method similar to the existing df.xs and df.filter methods. And by delegating to either isin or eq under the hood, the complexities of data type comparison should already be handled.

Comment From: MarcoGorelli

thanks for the suggestion

closing for now as the bar for the (already huge) API is very high, and the existing methods already allow you to do this