What do people think about inheriting pandas classes from abc.collections
classes?
I think the main ones are DataFrame
& Series
inheriting from Mapping
and Index
from Sequence
.
This would enable other libraries to infer classes' behavior in a tighter way, for example to work out whether an object is list-like but not dict-like, they can check the types, rather than hasattr(obj, '__iter__') and not hasattr(obj, 'values')
. And it doesn't cost anything.
But as far as I'm aware, other libraries in the pydata ecosystem don't do this, so this would be out of the norm.
https://docs.python.org/3/library/collections.abc.html
Comment From: kawochen
I think you have misunderstood abc
. These classes have special __subclasshook__
so that you don't have to inherit from them to be a subclass of them.
e.g.
In [172]: issubclass(pandas.Series, collections.Container)
Out[172]: True
So perhaps we can think about whether we want to make pandas
objects satisfy the necessary contracts.
Comment From: max-sixty
Cheers @kawochen. It looks like Mapping
doesn't implement that (its __subclasshook__
falls back to Sized).
I'm not sure why that is. It could be because it has some Mixin methods on top of its abstract methods? Whatever the reason, I'm fairly confident DataFrame satisfies the contracts
In [21]: df = pd.DataFrame()
In [22]: isinstance(df, collections.Mapping)
Out[22]: False
In [23]: isinstance(df, collections.Sized)
Out[23]: True
Comment From: kawochen
@MaximilianR nope it doesn't have items
edit: OK actually I don't know why.
Comment From: kawochen
I agree hasattr
isn't pretty. We can think about defining a few custom abstract base classes for pandas
using __subclasshook__
to replace those and functions like is_dict_like
. I don't know about performance, but it's probably just an extra indirection.
Comment From: max-sixty
@kawochen have a look at the Mapping.__subclasshook__
- it can never return True
:
In [27]: collections.Mapping.__subclasshook__??
Signature: collections.Mapping.__subclasshook__(C)
Source:
@classmethod
def __subclasshook__(cls, C):
if cls is Sized:
if any("__len__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
File: /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_collections_abc.py
Type: method
So I think for these, you would need to inherit from Mapping
. Still open to being wrong though
Comment From: kawochen
oh you are right. It looks like you'd need to call collections.Mapping.register(DataFrame)
Comment From: max-sixty
Or add that as one of the classes DataFrame
inherits from. That's fairly canonical, there's an example on the collections page: https://docs.python.org/3/library/collections.abc.html (although the benefits described are slightly orthogonal)
Comment From: kawochen
Hmm DataFrame.values
is not callable though.
Comment From: max-sixty
That's true. It does break the contract a bit. I still think it's better than what exists now, albeit not perfect.
And .values
-> .data
in pandas 1.0??
Comment From: mitar
I also think that pandas.DataFrame
should register with abstract classes.
I think the following would work:
Sequence.register(pandas.DataFrame)
To my understanding it exposes everything necessary.
Comment From: keijak
Making DataFrame a subclass of Mapping would be problematic due to the semantics of DataFrame__len__. len(df) returns the number of rows (=len(df.index)). It needs to be the number of columns to match the dict behavior.
Comment From: jreback
I don't think this is actually possible as we have some named properties e.g. index which have confliciing meaning with .index() iterables.