Feature Type
-
[ ] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
IMO it would be an API improvement for pandas if creating dataframes/series/arrays using dtype=str
(and dtype="str"
) would return a dataframe/series/array of dtype StringDtype
instead of dtype object
. The reason being that IMO in 99,9 % of cases where users instantiate using dtype=str
they would have prefer having used dtype="string"
and therefore have the guarantee that the array actually only contains strings (and NA's).
This would be similar to when instantiating currently using dtype=int
gives a dtype np.int64
and for dtype=float
we get np.float64
.
The above proposal would be backwards incompatible and too late to introduce depreciations in pandas 1.x now. However, could it become a breaking change as part of the jump to version 2.0 of pandas, similar to the backwards-incompatible changes already listed in #44823?
Feature Description
Basically it would just change the dtype resolution function to return a StringDtype
instead the current behavior, so reasonably simple to implement.
Alternative Solutions
The alternative would be to keep the current behavior in pandas 2.0.
Additional Context
No response
Comment From: phofl
I think this needs a more thorough investigation.
How would the behavior of follow up operations change?
Would you also change the behavior of I/O operations? I don't think that we can do this without a deprecation cycle
Comment From: mroeschke
I support dtype=str
eventually mapping to StringDtype
, but personally I think it would be better through a deprecation than a 2.0 breaking change.
Comment From: topper-123
Thanks for the reply. Yes, I hadn't considered IO, that makes it more challenging than I had though when I wrote up the issue...
I could support a deprecation cycle, though perhaps if it last the entire pandas 2.x cycle, maybe better to deprecate later in the cycle, e.g. pandas 2.3 or similar IMO.
Unless there is a wish do something now, I'll let this lay and I (or someone else) can pick this up at later, after pandas 2.0 has been released.
Comment From: phofl
We want to release 3.0 significantly faster than 2.0, so would be ok to introduce in 2.0 I think. But we want to finish enforcing deprecations first
Comment From: topper-123
Closing as superseded #52429, where the discussion is more current.