Pandas BUG: where(...) unexpectedly casts None to np.NA for series with pd.Int64Dtype

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.

Problem description

When trying to replace all pd.NA values with None using where(...) on a pd.Series with dtype pd.Int64Dtype, None values are automatically coerced to pd.NA in a way inconsistent with other dtypes. Additionally, the try_cast flag doesn't control this behavior.

Expected Output

I expect that when try_cast=False, None will not be coerced into np.NA, and the dtype of the series will instead be cast into object to accommodate the newly introduced None value, as occurs for other dtypes.

Example

In the following:

s = pd.Series([1, 2, 3, pd.NA], dtype=pd.Int64Dtype())
s.where(pd.notnull(s), None)

the None I'm trying to replace pd.NA with is automatically cast back into pd.NA. This occurs even though try_cast is set to False by default, which I would expect to control this casting behavior.

0       1
1       2
2       3
3    <NA>
dtype: Int64

This is in contrast to the behavior of other dtypes, like numpy's float dtype:

s = pd.Series([1, 2, 3, np.nan], dtype=np.dtype("float64"))
s.where(pd.notnull(s), None)

which outputs the series with the dtype coerced to object after incorporating None:

0       1
1       2
2       3
3    None
dtype: object

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.8.2.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 1.0.4 numpy : 1.18.4 pytz : 2020.1 dateutil : 2.8.1 pip : 19.2.3 setuptools : 41.2.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Comment From: jorisvandenbossche

If you convert it to object dtype first, then it actually replaces NA with None:

>>> s.astype(object).where(pd.notnull(s), None)   
0       1
1       2
2       3
3    None
dtype: object

Personally, I might find it good behaviour that it actually by default tries to preserve the dtype. But it's certainly inconsistent with how it works for other dtypes.

Comment From: jorisvandenbossche

The question is a bit: what values are considered "valid" for the current dtype, and what not?

Because for example for the numpy integer dtype, we also coerce the other if possible to some extent. Here I provide a float, but it still results in an int series:

In [51]: s = pd.Series([1, 2, 3])

In [52]: s.where(s == 1, 2.0)
Out[52]: 
0    1
1    2
2    2
dtype: int64

And only when the value cannot be interpreted as an int, it gets upcasted to float or object dtype.

Now, for the nullable integer dtype, we actually raise an error if it doesn't fit in integers:

In [62]: s = pd.Series([1, 2, 3], dtype="Int64")

In [63]: s.where(s == 1, 2.5) 
...
TypeError: cannot safely cast non-equivalent object to int64

(so in that light coercing None to NA certainly makes sense)

Comment From: jorisvandenbossche

Related where issue: https://github.com/pandas-dev/pandas/issues/24144

Comment From: jbrockmendel

@jorisvandenbossche is right, the solution here is to cast to object dtype

Comment From: phofl

Closing as expected behavior, masked dtypes don't upcast when setting incompatible values

Comment From: jbrockmendel

I think this is worth keeping open, as the lack of consistency with other dtypes is a pretty common complaint.

Comment From: phofl

I'd rather open an issue about documenting this? Was planning on doing this anyway after the rc for 2.0 is out

Comment From: jbrockmendel

The issue isnt the docs its the desired behavior. I'd be open to changing/deprecating the non-nullable behavior to not cast, but think we should be consistent throughout.

Pandas BUG: where(...) unexpectedly casts None to np.NA for series with pd.Int64Dtype

Problem description

Expected Output

Example

Output of pd.show_versions()

Output of `pd.show_versions()`