Code Sample, a copy-pastable example if possible
>>> s = pd.Series([3,4,5])
>>> b = s > 3
>>> b # a boolean series
0 False
1 True
2 True
dtype: bool
Now, various indexing method with b
:
>>> s.loc[b] # ok
>>> s[b] # ok
>>> s.iloc[b.values] # ok
>>>s.iloc[b] # not ok
NotImplementedError: iLocation based boolean indexing on an integer type is not available
Problem description
As the indexes of s
and b
are the same and the b. values
are booleans, .iloc
should work here (unless I'm missing something non-obvious). That is, it should join up the indices as per the panda rules, and then do a boolean filter.
I did some poking around the error frames, and the start function is pandas.core.indexing._iLocIndexer._getitem_axis
. The error appears to be that pandas.core.indexing._iLocIndexer._has_valid_type(key, axis)
check function is overzealous. If we call pandas.core.indexing._iLocIndexer._getbool_axis(key, axis=axis)
without doing the check in ._has_valid_type
, everything comes out fine.
It seems to me that pandas.core.indexing._iLocIndexer._has_valid_type
is too strict with boolean series. Is this correct?
Expected Output
s
filtered by b
.
Output of pd.show_versions()
Comment From: jreback
this is by-definition not allowed. .iloc
is purely positional, so it doesn't make sense to align with a passed Series (which is what all indexing operations do).
Comment From: jreback
http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position indicates only a boolean array
is allowed.
Comment From: nick-ulle
If the model is that .iloc
is purely positional, then why can't it do a purely positional selection of elements with a Boolean Series of the same length?
Is there a technical limitation or a deeper design decision here? Is it to prevent users who don't understand the model for .iloc
from using it with a Boolean Series and thinking it matched the indexes?