Currently, in the IntegerArray or BooleanArray astype
, we first convert to object dtype, and then convert to the requested dtype.
For example, in IntegerArray.astype
:
https://github.com/pandas-dev/pandas/blob/7d7f885856b0c3d51eaf15beaac9d4f30c23797d/pandas/core/arrays/integer.py#L551-L553
where self._coerce_to_ndarray()
creates an object ndarray.
In case of converting to integer (and if there are no NaNs), or in case of converting to float, this roundtrip through object dtype is not needed (we can directly convert the _data
and in case of float set NaNs)
Comment From: TomAugspurger
or in case of converting to float
Note that we need to have a discussion on the behavior of converting to float once we start using pd.NA
. Do we want to implicitly convert NA to np.nan
when the user does boolarray.astype('float')
Is requesting a float a large enough hint to imply they want NA converted to NaN?
Comment From: phofl
I think we can close here. This Is optimised for the no-na case now