Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I'm using df.str.split(....) in pandas currently.
Tries to convert my df to pyarrow dtype, but this functionality is missing.
I get this error:
NotImplementedError: str.split not supported with pd.ArrowDtype(pa.string()).
Feature Description
Add the str.split function to pyarrow dtype.
Alternative Solutions
The alternative solution would be to use .apply which is not what I want to do, or stick to the classic numpy dtype.
Additional Context
No response
Comment From: pstorozenko
I wanted to run some benchmarks and I landed on the same issue.
I see there's a bunch of functions that are not yet implemented for pd.ArrowDtype(pa.string())
strings, although they seem to be implementable by using pyarrow.compute functions, at least some of them.
Is this something that has been left for a reason or it just requires someone to do the coding?
Comment From: jgarba
Take
Comment From: mroeschke
Thanks for the report. The str
methods that were not implemented generally felt into 2 groups
- Do not have an efficient, equivalent pyarrow compute function
- Have tricky return types that require some gymnastics to integrate cleanly e.g.
split
should ideally returnpa.list(pa.string()
but the internals make that tricky.
Most definitely we want these implemented eventually hence NotImplmentedError