import pandas as pd
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
_metadata = ['new_property']
def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):
super(MyDataFrame, self).__init__(data=data,
index=index,
columns=columns,
dtype=dtype,
copy=copy)
self.new_property = new_property
data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')
df1[['a', 'b']]
Problem description
I'm working on a new data structure that subclasses pandas DataFrame. I want to enforce my new data structure to have new_property, so that it can be processed safely later on. However, I'm running into error when using my new data structure, because the constructor gets called by some internal pandas function without the required property. The code above produces the following error.
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-
packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "
Expected Output
I expect the slicing happen without any error. Is there a fix to this or an alternative way to design this to enforce my new data structure to have new_property?
Output of pd.show_versions()
Comment From: chris-b1
Standard recommendation - avoid subclassing if you can! There will be some new accessor apis in 0.23
that may make this easier in some cases (#18827)
That said, here you need to make new_property
an optional arg - DataFrame.__finalize__
will pass it along to the sliced instance.
# omitted
def __init__(self, data, new_property=None, index=None, columns=None, dtype=None, copy=True):
# omitted
df1 = MyDataFrame(data1, new_property='value')
In [93]: df1[['a', 'b']].new_property
Out[93]: 'value'
Comment From: hlums
@chris-b1 Thanks for the reply! Yes, making new_property will avoid the error, but it doesn't enforce MyDataFrame to have new_property. The user will be able to create MyDataFrame by df1=MyDataFrame(data1)
without passing a value for new_property, which will cause problem in downstream processing.
Comment From: chris-b1
In that case you could define an alternative constructor that only pandas uses and enforce your api on __init__
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame._internal_ctor
_metadata = ['new_property']
@classmethod
def _internal_ctor(cls, *args, **kwargs):
kwargs['new_property'] = None
return cls(*args, **kwargs)
def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):
super(MyDataFrame, self).__init__(data=data,
index=index,
columns=columns,
dtype=dtype,
copy=copy)
self.new_property = new_property
data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')
df1[['a', 'b']].new_property
Out[121]: 'value'
MyDataFrame(data1)
TypeError: __init__() missing 1 required positional argument: 'new_property'
Comment From: hlums
@chris-b1 Thank you so much! This is exactly what I needed!
Comment From: hlums
@chris-b1 I'm getting the following error when using pd.concat on my subclass.
Traceback (most recent call last):
File "C:\Anaconda\envs\myenv\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "
It seems like the _from_axes() class method in generic.py is calling the init() method of my class directly, without going through the _constructor. It doesn't seem to be a solvable problem to me, but still wanted to check with a pandas expert.
Thanks!
Comment From: chris-b1
Can you open a new issue with just a reproduction of that? I think should be calling _constructor
but not sure.
Here's a hacky workaround
def __init__(self, data, new_property=None, index=None, columns=None, dtype=None, copy=True):
if not isinstance(data, pd.core.internals.BlockManager) and not new_property:
raise ValueError("arg new_property required")
super(MyDataFrame, self).__init__(data=data,
index=index,
columns=columns,
dtype=dtype,
copy=copy)
self.new_property = new_property
Comment From: hlums
@chris-b1 Thanks for the suggestion! I will open an issue about this. For now, I'm overwriting _from_axes and make it call our internal constructor. Seems to be working fine.