Is your feature request related to a problem?
This might help with two things
- A coordination point for 3rd-party libraries creating objects they'd like to turn into DataFrames, and users of those libraries
- Possibly, simplification of
DataFrame.__init__
Describe the solution you'd like
A new top-level pd.dataframe
function.
def dataframe(data: Any, index: Index, columns: Index, copy: bool = False):
"""
Create a pandas DataFrame from data.
"""
@singledispatch.register(np.ndarray)
def dataframe(...):
pass
API breaking implications
None
Describe alternatives you've considered
xref https://github.com/pandas-dev/pandas/pull/32844. Which attempted this for DataFrame.__init__
. That was a non-starter since it exposed our internal BlockManager too publicly. https://github.com/pandas-dev/pandas/pull/32844#issuecomment-601494850. So we'd need to do this on a top-level function instead.
Comment From: simonjayhawkins
xref #32908 for alternative
Comment From: jbrockmendel
Just checking if I understand the idea:
Downstream library Foo has a class ModelData
with something like a to_frame()
method and by writing
@singledispatch.register(ModelData)
def dataframe(model_data, ...):
return model_data.to_frame(...)
they make pd.dataframe
Just Work on ModelData objects?
Comment From: TomAugspurger
Yep. functools.singledispatch
looks at the type of the first argument and dispatches off that (with a fallback default if desired). So when pd.dataframe
encounters a model_data
, it would call the function registered for it (which would be expected to return an initialized pandas DataFrame.
Comment From: jbrockmendel
The experience with _constructor has soured me on the cost/benefit tradeoff of adding customization hooks for downstream libraries