ser = pd.Series([3, 4, 5])
>>> ser[:2] # <- should this behave like loc or iloc?
>>> ser.loc[:2]
>>> ser.iloc[:2]
Focusing only on Integer-dtype indexes, i.e. Int64Index, UInt64Index, RangeIndex, and NumericIndex[inty].
ser[2]
and ser[[2]]
each treat the indexing as label-based instead of positional. But ser[:2]
treats the slice as positional instead of label-based.
We've deprecated some of these cases in #45162, but as I go to enforce the deprecation I think I screwed up and issued the warning for too few cases. In particular, to keep the number of warnings down I excluded some cases
elif start is None and stop is None:
# label-based vs positional is irrelevant
pass
elif isinstance(self, ABCRangeIndex) and self._range == range(
len(self)
):
# In this case there is no difference between label-based
# and positional, so nothing will change.
pass
elif (
self.dtype.kind in ["i", "u"]
and self._is_strictly_monotonic_increasing
and len(self) > 0
and self[0] == 0
and self[-1] == len(self) - 1
):
# We are range-like, e.g. created with Index(np.arange(N))
pass
The elif start is None and stop is None
case is correct, but the other two are wrong bc with .loc
slicing is right-inclusive.
If I enforce the deprecation for these extra two cases, I get 555 test failures, so this isn't trivial.
Comment From: jbrockmendel
@jreback
Comment From: jreback
@jbrockmendel your logic seems correct here. agree that s[:2] should be treated as location and warn that it's not positional
Comment From: jbrockmendel
@mroeschke thoughts on what to do here?
Comment From: mroeschke
I know you wanted to focus on int indexes, but It would be nice is this was positional to align with Series(..., index=non_numeric)[:int]
according to this example: https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#series-is-ndarray-like
Unless you had come across this and wanted to change this too
Comment From: jbrockmendel
IIUC what you're suggesting would make/keep int-slicing-with-intindex inconsistent with other int-indexing-with-intindex, but possibly in a way that has fewer corner cases than the status quo?
(Long-term (i.e. 3.0) im increasingly thinking we should just deprecate Series.__getitem__
and Series.__setitem__
entirely and tell users to just use loc/iloc, so there's never any ambiguity.)
Comment From: mroeschke
IIUC what you're suggesting would make/keep int-slicing-with-intindex inconsistent with other int-indexing-with-intindex, but possibly in a way that has fewer corner cases than the status quo?
True, int-slicing
and int-indexing
would diverge in this case for this index type, but int-slicing
would be consistent regardless of index type if int-slicing
was always positional. I guess there are multiple ways to cut the consistency pie (by indexer, Index type, or both), and I haven't been keeping up well in which direction we should be/are headed so I don't have super strong opinions on what feels more right.
(Long-term (i.e. 3.0) im increasingly thinking we should just deprecate Series.getitem and Series.setitem entirely and tell users to just use loc/iloc, so there's never any ambiguity.)
+10.
Comment From: jbrockmendel
So I guess for 2.0 we just revert the deprecation?
Comment From: mroeschke
Probably good to get more thoughts from others who have looked closer at indexing behavior @phofl @jorisvandenbossche
Comment From: jbrockmendel
Actually just reverting wouldn't quite get to the consistent-ish behavior you suggested. we'd also need to change the behavior with Float64Index, which ATM always treats slicing as label-based. Doing that breaks 5 tests locally, though only one directly testing this behavior.