Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
>>> b = pd.Series([1.2, 2.2, 3.2], dtype="Float64")
>>> a = pd.Series([1,2,3], dtype="Int32")
>>> a[a < 2] = b
TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe' The above exception was the direct cause of the following exception: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\series.py", line 1162, in __setitem__
self._where(~key, value, inplace=True)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\generic.py", line 9733, in _where
new_data = self._mgr.putmask(mask=cond, new=other, align=align)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\managers.py", line 407, in putmask
return self.apply(
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\managers.py", line 347, in apply
applied = getattr(b, f)(**kwargs)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\blocks.py", line 1517, in putmask
values._putmask(mask, new)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\base.py", line 1523, in _putmask
self[mask] = val
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\masked.py", line 237, in __setitem__
value, mask = self._coerce_to_array(value, dtype=self.dtype)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\numeric.py", line 258, in _coerce_to_array
values, mask, _, _ = _coerce_to_data_and_mask(
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\numeric.py", line 214, in _coerce_to_data_and_mask
values = dtype_cls._safe_cast(values, dtype, copy=False)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\integer.py", line 57, in _safe_cast
raise TypeError(
TypeError: cannot safely cast non-equivalent object to int32
If I change b to integral values, such as b = pd.Series([2, 2, 2], dtype="Float64"), then that code succeeds, with the series dtype as Int32.
Issue Description
It't not clear why I the Int32 series is not promoted to Float64 series when using the extension dtypes which would allow the assignment to succeed. The same code succeeds using numpy types (not the nullable extension types).
Expected Behavior
I expected the same behavior I get with numpy types:
>>> b = pd.Series([1.2, 2.2, 3.2], dtype="float64")
>>> a = pd.Series([1,2,3], dtype="int32")
>>> a[a < 2] = b
>>> a
0 1.2
1 2.0
2 3.0
dtype: float64
Installed Versions
Comment From: phofl
This is the intended behavior for ea dtypes. There is also a proposal to change this for numpy dtypes.
There are a couple of issues where this is discussed
Comment From: rohanjain101
@phofl I looked at the issues where this is discussed, and the reasoning provided is because setitem is an inplace operation, so original type should be preserved. Then why does this issue also occur for fillna which by default is not inplace and returns a new series? Is it expected for ea dtypes that they will not be promoted for copy operations?
` z = pd.Series([1,2,pd.NA], dtype="Int8")
z.fillna(1.2) Traceback (most recent call last): File "
", line 1, in File "C:\pd_latest\lib\site-packages\pandas\util_decorators.py", line 331, in wrapper return func(args, kwargs) File "C:\pd_latest\lib\site-packages\pandas\core\series.py", line 5298, in fillna return super().fillna( File "C:\pd_latest\lib\site-packages\pandas\core\generic.py", line 6842, in fillna new_data = self._mgr.fillna( File "C:\pd_latest\lib\site-packages\pandas\core\internals\managers.py", line 443, in fillna return self.apply( File "C:\pd_latest\lib\site-packages\pandas\core\internals\managers.py", line 352, in apply applied = getattr(b, f)(*kwargs) File "C:\pd_latest\lib\site-packages\pandas\core\internals\blocks.py", line 1586, in fillna return super().fillna(value, limit=limit, inplace=inplace, downcast=downcast) File "C:\pd_latest\lib\site-packages\pandas\core\internals\blocks.py", line 1200, in fillna nbs = self.where(value, ~mask.T, _downcast=False) File "C:\pd_latest\lib\site-packages\pandas\core\internals\blocks.py", line 1457, in where res_values = arr._where(cond, other).T File "C:\pd_latest\lib\site-packages\pandas\core\arrays\base.py", line 1547, in _where result[~mask] = val File "C:\pd_latest\lib\site-packages\pandas\core\arrays\masked.py", line 232, in setitem value = self._validate_setitem_value(value) File "C:\pd_latest\lib\site-packages\pandas\core\arrays\masked.py", line 223, in _validate_setitem_value raise TypeError(f"Invalid value '{str(value)}' for dtype {self.dtype}") TypeError: Invalid value '1.2' for dtype Int8
`
Comment From: phofl
Yes exactly, no upcasting right now. You can weigh in in the other issues if you like, but want to keep the discussion in one place
Comment From: rohanjain101
@phofl then why does concat do upcasting with ea dtypes?
>>> b = pd.Series([1.2, 2.2, 3.2], dtype="Float64")
>>> a = pd.Series([1,2,3], dtype="Int32")
>>> c = pd.concat([a,b])
>>> c
0 1.0
1 2.0
2 3.0
0 1.2
1 2.2
2 3.2
dtype: Float64
>>>
If the reason is because it returns a new df, then fillna also returns a new df right? Why does concat do the upcasting with ea dtypes but not fillna when they both return a new df?