In shell grep you can do stuff like:
λ seq 1000 | grep -C 3 500
497
498
499
**500**
501
502
503
λ seq 1000 | grep -B 1 -A 3 500
499
**500**
501
502
503
I'd be glad to have something like
s.grep_index(pat_or_pred,A=1,B=3) # -> list of series/df , each of len(A+B) or C
s.grep_index(pat_or_pred,C=5)
also
s.grep_index(label_list,C=5)
in core.
An example use-case: looking for point of change in a timeseries, where you want to grab a window around a point of change:
s = some_timeseries
# pickout index label where a big change occured
cps = s[abs(s.diff()- threshold) > threshold]
# cps is a list of labels for "interest points"
list_of_s = s.grep_index(cps,C=5)
s[0]
< 5 rows from s, with the middle one being the first "intrest point" in cps >
the "return list of series" is really a sort of variant on groupby, and could concievably be implemented on top of something like
3101, after de-warting.
Would also love to have this working with #2460 (if it ever makes it in) for the same (but slow) on plain,unindexed data columns.
Comment From: petehuang
Hi,
This issue's last interaction was in 2013. I noticed you had paused work on the PR - are you aware of any other efforts to pick this up? Does this continue to represent a useful enhancement?
Comment From: jreback
things like this are contemplated in pandas 2.0
Comment From: MattFaus
I came up with this for now:
def find_with_context(base_idx, before=0, after=0, context=0):
if context != 0:
before = context
after = context
ret_idx = base_idx
for i in range(1, before + 1):
ret_idx |= base_idx.shift(-i)
for i in range(1, after + 1):
ret_idx |= base_idx.shift(i)
return ret_idx
df[find_with_context(df.my_column != 1.0, before=1)]
Comment From: robrechtdr
Just want to give big thumbs up for a grep-like functionality.
For me the most useful part of it is for a dataframe search across all columns (stringified values). This is hugely time saving functionality. For now using the following:
df[np.logical_or.reduce([df[col].astype(str).str.contains("somethingtosearchfor") for col in df.columns])]
It's a big time saver to not always have to first inspect the columns and/or values and only then perform a filter specifying a column name when debugging; as this need arises very often. So it would save time on having to find the right column and also having to type the column name (you can't always use auto-complete with the dot-syntax as far as I'm aware like when a columns contains spaces (which happens often). Also, an auto-complete supported debugging tool is not always immediately accessible).