Pandas Index API proposal: unified axis label lookup

As the next step of separation-of-concerns plan (#6744) I'd like to propose adding a method (or several, actually) to Index class that would encapsulate the details of foo.loc[l1,l2,...] lookup.

Implementation Idea

Roughly, the idea is to make loc's getitem as simple as

def __getitem__(self, indexer):
    axes = self.obj.axes
    return self.obj.iloc[axes[0].lookup_labels_nd(indexer, axes[1:], typ='loc')]

Not quite, but hopefully you get the point. The default lookup_labels_nd implementation would then look something like this:

def lookup(self, indexer, other_axes, typ=None):
    if not isinstance(indexer, tuple):
        return self.lookup_labels(indexer, typ=typ)
    else:
        # ndim mismatch error handling is omitted intentionally
        return (self.lookup_labels(indexer[0]),) + \
               tuple(ax.lookup_labels(ix, typ=typ)
                     for ax, ix in zip(other_axes, indexer))

The result should be an object that could be fed to an underlying BlockManager to perform the requested operation. To support adding new rows with "setitem", it is only needed to agree that lookup_labels_nd will never return negative indices unless they reference newly appended items along that axis.

This would allow to hide Index-subclass-specific lookup peculiarities in their respective overrides of lookup_labels_nd and lookup_labels (proposals for better names are welcome), e.g.: - looking up str in DatetimeIndex/PeriodIndex - looking up int in FloatIndex - looking up per-level slices in MultiIndex

Benefits

no more confusing errors due to try .. catch block carpet-catching a logic error, because corner cases will be handled precisely where they are needed and nowhere else
no more relying on isinstance checks and exceptions to seek for alternative lookup scenarios, meaning more performance
the API will provide a contract that is simple to grasp, test, benchmark and, eventually, cythonize (as a side effect of this point I'd like to try putting up a wiki page with indexing API reference)

Comment From: immerrr

I was trying to mock up a proof of concept (while at ep2014) and decided to start with simple things, at and iat, but found something weird:

In [19]: s = pd.Series([1,2,3], index=list('abc'))

In [20]: s.at['a']
Out[20]: 1

In [21]: s.at['b']
Out[21]: 2

In [22]: s.at['c']
Out[22]: 3

In [23]: s.at[0]
Out[23]: 1

I couldn't find any reference to this feature in pandas doc, and I have a strong feeling it shouldn't be there. @jrebac, what do you think, do I have to maintain it?

Comment From: jreback

I think that is a bug, at should be a restricted version of loc, e.g. (no slicing or lists), and certainly no fallback. Prob just not checking that the label type is a valid type for the index. pls do a PR to test/fix this if you would. thanks

Comment From: jreback

https://github.com/pydata/pandas/issues/7814

Comment From: shoyer

Just stumbled across this -- it gets a bit +1 from me!

Comment From: jreback

cc @toobaz

closing as stale, but certainly would take a PR (or more) to simplify internal indexing :>

Comment From: toobaz

Just for the records, since I was cced: I'm -1 on this, for two reasons.

The first is conceptual: I don't like Index to have a method that work on multiple indexes. Despite the mess in core/indexing.py, which would certainly benefit from some decoupling, the general idea that indexing must be done at the object level, rather than at the index level, makes sense to me.

The second is practical: there are calls to .loc which cannot be expressed in calls to .iloc, such as when you do partial indexing on a MultiIndex.

By the way, the crucial step to allow real decoupling is to enable BlockManager.set_value for mixed dtypes.