Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> b = pd.Series([1.2, 2.2, 3.2], dtype="Float64")
>>> a = pd.Series([1,2,3], dtype="Int32")
>>> a[a < 2] = b

TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe' The above exception was the direct cause of the following exception: Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\series.py", line 1162, in __setitem__
    self._where(~key, value, inplace=True)
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\generic.py", line 9733, in _where
    new_data = self._mgr.putmask(mask=cond, new=other, align=align)
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\managers.py", line 407, in putmask
    return self.apply(
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\managers.py", line 347, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\blocks.py", line 1517, in putmask
    values._putmask(mask, new)
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\base.py", line 1523, in _putmask
    self[mask] = val
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\masked.py", line 237, in __setitem__
    value, mask = self._coerce_to_array(value, dtype=self.dtype)
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\numeric.py", line 258, in _coerce_to_array
    values, mask, _, _ = _coerce_to_data_and_mask(
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\numeric.py", line 214, in _coerce_to_data_and_mask
    values = dtype_cls._safe_cast(values, dtype, copy=False)
  File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\integer.py", line 57, in _safe_cast
    raise TypeError(
TypeError: cannot safely cast non-equivalent object to int32


If I change b to integral values, such as b = pd.Series([2, 2, 2], dtype="Float64"), then that code succeeds, with the series dtype as Int32.

Issue Description

It't not clear why I the Int32 series is not promoted to Float64 series when using the extension dtypes which would allow the assignment to succeed. The same code succeeds using numpy types (not the nullable extension types).

Expected Behavior

I expected the same behavior I get with numpy types:

>>> b = pd.Series([1.2, 2.2, 3.2], dtype="float64")
>>> a = pd.Series([1,2,3], dtype="int32")
>>> a[a < 2] = b
>>> a
0    1.2
1    2.0
2    3.0
dtype: float64

Installed Versions

>>> pd.show_versions() C:\pd_latest\lib\site-packages\_distutils_hack\__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 85 Stepping 7, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 1.5.3 numpy : 1.24.2 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 66.1.1 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: phofl

This is the intended behavior for ea dtypes. There is also a proposal to change this for numpy dtypes.

There are a couple of issues where this is discussed

Comment From: rohanjain101

@phofl I looked at the issues where this is discussed, and the reasoning provided is because setitem is an inplace operation, so original type should be preserved. Then why does this issue also occur for fillna which by default is not inplace and returns a new series? Is it expected for ea dtypes that they will not be promoted for copy operations?

` z = pd.Series([1,2,pd.NA], dtype="Int8")

z.fillna(1.2) Traceback (most recent call last): File "", line 1, in File "C:\pd_latest\lib\site-packages\pandas\util_decorators.py", line 331, in wrapper return func(args, kwargs) File "C:\pd_latest\lib\site-packages\pandas\core\series.py", line 5298, in fillna return super().fillna( File "C:\pd_latest\lib\site-packages\pandas\core\generic.py", line 6842, in fillna new_data = self._mgr.fillna( File "C:\pd_latest\lib\site-packages\pandas\core\internals\managers.py", line 443, in fillna return self.apply( File "C:\pd_latest\lib\site-packages\pandas\core\internals\managers.py", line 352, in apply applied = getattr(b, f)(*kwargs) File "C:\pd_latest\lib\site-packages\pandas\core\internals\blocks.py", line 1586, in fillna return super().fillna(value, limit=limit, inplace=inplace, downcast=downcast) File "C:\pd_latest\lib\site-packages\pandas\core\internals\blocks.py", line 1200, in fillna nbs = self.where(value, ~mask.T, _downcast=False) File "C:\pd_latest\lib\site-packages\pandas\core\internals\blocks.py", line 1457, in where res_values = arr._where(cond, other).T File "C:\pd_latest\lib\site-packages\pandas\core\arrays\base.py", line 1547, in _where result[~mask] = val File "C:\pd_latest\lib\site-packages\pandas\core\arrays\masked.py", line 232, in setitem value = self._validate_setitem_value(value) File "C:\pd_latest\lib\site-packages\pandas\core\arrays\masked.py", line 223, in _validate_setitem_value raise TypeError(f"Invalid value '{str(value)}' for dtype {self.dtype}") TypeError: Invalid value '1.2' for dtype Int8

`

Comment From: phofl

Yes exactly, no upcasting right now. You can weigh in in the other issues if you like, but want to keep the discussion in one place

Comment From: rohanjain101

@phofl then why does concat do upcasting with ea dtypes?

>>> b = pd.Series([1.2, 2.2, 3.2], dtype="Float64")
>>> a = pd.Series([1,2,3], dtype="Int32")
>>> c = pd.concat([a,b])
>>> c
0    1.0
1    2.0
2    3.0
0    1.2
1    2.2
2    3.2
dtype: Float64
>>>

If the reason is because it returns a new df, then fillna also returns a new df right? Why does concat do the upcasting with ea dtypes but not fillna when they both return a new df?