With version: 0.20.3, DataFrame loc
and iloc
have inconsistent and buggy indexing behaviors.
df = pd.DataFrame([dict(idx=idx) for idx in range(10)])
print(df.loc[range(3) + range(-3, 0), 'idx'])
returns NaN for negative indices
0 0.0
1 1.0
2 2.0
-3 NaN
-2 NaN
-1 NaN
(also note that somehow the int
became float
...)
whereas
print(df.iloc[range(3) + range(-3, 0)])
returns the last raws
0 0
1 1
2 2
7 7
8 8
9 9
Additionally, loc
fails if only negative indices are passed:
df.loc[[-2, -1], 'idx']
but not if both positive and negative
df.loc[[0, -1], 'idx']
Comment From: jorisvandenbossche
@kingjr Have a look at the indexing docs (http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) on the different options to index (see also that link more below under "Selection by label" and "Selection by position")
The behaviours of loc
and iloc
are different on purpose because they serve different goals:
-
loc
is label based: the negative values are not present in the index labels, and hence you get missing values for that (loc returns a result once there is at least one existing label present, in the case ofdf.loc[[-2, -1], 'idx']
no existing label is present and therefore it raises) -
iloc
is position based: negative indices here mean 'start to count from the end', and therefore the shown result is perfectly as expected
(also note that somehow the int became float...)
This is currently a limitation of pandas that missing values can only be represented for floats, see http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na
Comment From: jreback
pls read the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label
.loc
is for label based indexing. The neg indices are not found and reindexed to NaN
. An integer index by-definition is label based indexed with .loc
.
.iloc
is always positional indexed.
Comment From: kingjr
Thanks. Is there a reason for not allowing label based indexing to interpret negative integers?
Comment From: jorisvandenbossche
Is there a reason for not allowing label based indexing to interpret negative integers?
Because then it would not be label based anymore, but position based
Comment From: kingjr
I understand. So to modify the last two lines of a particular column, what would you recommend? Something like this?
df.loc[df.index[-2:], 'column'] = range(2)
Comment From: jorisvandenbossche
yes, that is fine