What do people think about inheriting pandas classes from abc.collections classes?

I think the main ones are DataFrame & Series inheriting from Mapping and Index from Sequence.

This would enable other libraries to infer classes' behavior in a tighter way, for example to work out whether an object is list-like but not dict-like, they can check the types, rather than hasattr(obj, '__iter__') and not hasattr(obj, 'values'). And it doesn't cost anything.

But as far as I'm aware, other libraries in the pydata ecosystem don't do this, so this would be out of the norm.

https://docs.python.org/3/library/collections.abc.html

Comment From: kawochen

I think you have misunderstood abc. These classes have special __subclasshook__ so that you don't have to inherit from them to be a subclass of them.
e.g.

In [172]: issubclass(pandas.Series, collections.Container)
Out[172]: True

So perhaps we can think about whether we want to make pandas objects satisfy the necessary contracts.

Comment From: max-sixty

Cheers @kawochen. It looks like Mapping doesn't implement that (its __subclasshook__ falls back to Sized). I'm not sure why that is. It could be because it has some Mixin methods on top of its abstract methods? Whatever the reason, I'm fairly confident DataFrame satisfies the contracts


In [21]: df = pd.DataFrame()

In [22]: isinstance(df, collections.Mapping)
Out[22]: False

In [23]: isinstance(df, collections.Sized)
Out[23]: True

Comment From: kawochen

@MaximilianR nope it doesn't have items edit: OK actually I don't know why.

Comment From: kawochen

I agree hasattr isn't pretty. We can think about defining a few custom abstract base classes for pandas using __subclasshook__ to replace those and functions like is_dict_like. I don't know about performance, but it's probably just an extra indirection.

Comment From: max-sixty

@kawochen have a look at the Mapping.__subclasshook__ - it can never return True:

In [27]: collections.Mapping.__subclasshook__??
Signature: collections.Mapping.__subclasshook__(C)
Source:
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Sized:
            if any("__len__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented
File:      /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_collections_abc.py
Type:      method

So I think for these, you would need to inherit from Mapping. Still open to being wrong though

Comment From: kawochen

oh you are right. It looks like you'd need to call collections.Mapping.register(DataFrame)

Comment From: max-sixty

Or add that as one of the classes DataFrame inherits from. That's fairly canonical, there's an example on the collections page: https://docs.python.org/3/library/collections.abc.html (although the benefits described are slightly orthogonal)

Comment From: kawochen

Hmm DataFrame.values is not callable though.

Comment From: max-sixty

That's true. It does break the contract a bit. I still think it's better than what exists now, albeit not perfect.

And .values -> .data in pandas 1.0??

Comment From: mitar

I also think that pandas.DataFrame should register with abstract classes.

I think the following would work:

Sequence.register(pandas.DataFrame)

To my understanding it exposes everything necessary.

Comment From: keijak

Making DataFrame a subclass of Mapping would be problematic due to the semantics of DataFrame__len__. len(df) returns the number of rows (=len(df.index)). It needs to be the number of columns to match the dict behavior.

Comment From: jreback

I don't think this is actually possible as we have some named properties e.g. index which have confliciing meaning with .index() iterables.