When chaining multiple calls I often want to select certain elements. Unfortunately getting only the values requires assignment and thus breaks the chaining:
df_sub = df[df == x]
One hack is:
df.replace({x: np.nan}).dropna()
I think a drop_values()
function would be nice addition where I could say:
df.drop_values(x).more().chaining()
Comment From: jreback
you mean like this?
In [18]: df = pd.DataFrame({'A': range(3), 'B': list('abc')})
In [19]: df.assign(bar=df.A+1).loc[lambda df: df.B=='a']
Out[19]:
A B bar
0 0 a 1
Comment From: jreback
or dropping
In [20]: df.assign(bar=df.A+1).loc[lambda df: df.B != 'a']
Out[20]:
A B bar
1 1 b 2
2 2 c 3
Comment From: twiecki
TIL you can supply functions to .loc[]
. Yes, exactly like that. Since that works not sure more syntactic sugar is needed although it would simply the process a little bit.
Comment From: jreback
more API is not needed here, this is a pretty common pattern, docs are here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-callable
Comment From: twiecki
Thanks @jreback, and sorry for the distraction.