Revisiting some 6 month old class that subclasses Series
and overrides __finalize__
as:
def __finalize__(self, other, method=None, **kwargs):
pd.Series.__finalize__(self, other, method=method, **kwargs)
self.metadata = other.metadata.copy()
self.metadata.parent = self
return self
With 0.21.1 these objects can no longer be rendered:
>>> ser
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/base.py", line 80, in __repr__
return str(self)
File "pandas/core/base.py", line 60, in __str__
return self.__bytes__()
File "pandas/core/base.py", line 72, in __bytes__
return self.__unicode__().encode(encoding, 'replace')
File "pandas/core/series.py", line 1066, in __unicode__
max_rows=max_rows, length=show_dimensions)
File "pandas/core/series.py", line 1109, in to_string
max_rows=max_rows)
File "pandas/io/formats/format.py", line 176, in __init__
self._chk_truncate()
File "pandas/io/formats/format.py", line 190, in _chk_truncate
series.iloc[-row_num:]))
File "pandas/core/reshape/concat.py", line 213, in concat
return op.get_result()
File "pandas/core/reshape/concat.py", line 377, in get_result
return cons(mgr, name=name).__finalize__(self, method='concat')
File "pdsm/core/series.py", line 125, in __finalize__
self.metadata = other.metadata.copy()
AttributeError: '_Concatenator' object has no attribute 'metadata'
I had assumed that other
would always be of the same class as self
. Is _Concatenator
a recent addition?
Comment From: jreback
can you demonstrate an actual example?
Comment From: jbrockmendel
class Metadata(object):
def __init__(self, parent):
self.parent = parent
def copy(self):
return Metadata(parent=self.parent)
class MetaSeries(pd.Series):
_metadata = pd.Series._metadata + ['metadata']
@property
def _constructor(self):
return MetaSeries
def __init__(self, *args, **kwargs):
super(MetaSeries, self).__init__(*args, **kwargs)
md = Metadata(parent=self)
object.__setattr__(self, 'metadata', md)
def __finalize__(self, other, method=None, **kwargs):
pd.Series.__finalize__(self, other, method=method, **kwargs)
md = other.metadata.copy()
object.__setattr__(self, 'metadata', md)
md.parent = self
return self
>>> ser = pd.Series(range(100))
>>> mser = MetaSeries(ser)
>>> mser[:5]
0 0
1 1
2 2
3 3
4 4
dtype: int64
>>> mser
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/base.py", line 80, in __repr__
return str(self)
File "/usr/local/lib/python2.7/site-packages/pandas/core/base.py", line 60, in __str__
return self.__bytes__()
File "/usr/local/lib/python2.7/site-packages/pandas/core/base.py", line 72, in __bytes__
return self.__unicode__().encode(encoding, 'replace')
File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1066, in __unicode__
max_rows=max_rows, length=show_dimensions)
File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 1109, in to_string
max_rows=max_rows)
File "/usr/local/lib/python2.7/site-packages/pandas/io/formats/format.py", line 176, in __init__
self._chk_truncate()
File "/usr/local/lib/python2.7/site-packages/pandas/io/formats/format.py", line 190, in _chk_truncate
series.iloc[-row_num:]))
File "/usr/local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 213, in concat
return op.get_result()
File "/usr/local/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 377, in get_result
return cons(mgr, name=name).__finalize__(self, method='concat')
File "subclass.py", line 27, in __finalize__
self.metadata = other.metadata.copy()
AttributeError: '_Concatenator' object has no attribute 'metadata'
Comment From: jorisvandenbossche
@jbrockmendel That is code that I recently rewrote (https://github.com/pandas-dev/pandas/pull/17728), so it is quite possible I introduces a bug.
However, I cannot reproduce the exact error you get. I get:
In [3]: mser
..
ValueError: All objects passed were None
and I already get to for 0.20.3 as well.
Comment From: jorisvandenbossche
Ah, there was a return self
missing in the __finalize__
implementation in the code example above. Now I get the error you were showing (and now it also works in older pandas version).
And I suppose that I meant to pass the first object from the list to concat instead of self
. Does that make sense?
However, it seems that I just did how it was already done before (using this __finalize__(self)
construct where self
is the _Concatenator
object). So not sure why it started failing.
Comment From: jorisvandenbossche
OK, I understand what changed the behaviour. Before my PR (https://github.com/pandas-dev/pandas/pull/17728), concat
of serieses always returned a Series, while my PR changed it to return an instance of the class of the first object.
So in practice, the __finalize__
of your MetaSeries
is now called instead of Series.__finalize__
, and the latter has a check that self
is NDFrame like:
https://github.com/pandas-dev/pandas/blob/8acdf801c501a0ce2ac14ddd676cd248f0f32180/pandas/core/generic.py#L4000-L4003
Given that, I would argue the current code might be correct, as passing through the metadata of the first object is maybe not the best default? (Eg that would ignore different metadata in the different objects you are concatting).
The subclass can always have specialized code in __finalize__
for checking such metadata
Comment From: jbrockmendel
Ah, there was a return self missing in the finalize implementation in the code example above. Now I get the error you were showing (and now it also works in older pandas version).
Good catch. I just edited the previous comment to fix that mistake.
Given that, I would argue the current code might be correct, as passing through the metadata of the first object is maybe not the best default? (Eg that would ignore different metadata in the different objects you are concatting).
I think you're right that the problem above would be fixed by having a isinstance(other, NDFrame)
check in the subclass's __finalize__
method. (In the non-toy version MetaSeries
, the metadata object is responsible for figuring out how to propagate itself under various operations.)
concat
is well outside the areas that I'm familiar with (and I was surprised that it was relevant for __repr__
). For my own understanding, why is _Concatenator
a reasonable object to pass to __finalize__
? I don't see how it could usefully propagate any metadata.
Comment From: jorisvandenbossche
I was surprised that it was relevant for
__repr__
Because it is a truncated repr, concatting first and last part.
For my own understanding, why is _Concatenator a reasonable object to pass to finalize? I don't see how it could usefully propagate any metadata.
That object holds more information, eg the list of all concatenated objects. So for example you could check for a certain metadata field if all objects have the same value, and only in that case set it on the resulting object.
Comment From: jbrockmendel
This is probably closable, possibly worth keeping around on some tracker for "subclassing pitfalls"
Comment From: jorisvandenbossche
@jbrockmendel want to do a PR to mention this in the subclassing docs?