Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

for extension_dtype in ["Int64", "Float64"]:
    pandas_float_data = pd.Series([1, None, 3], dtype=extension_dtype)
    numpy_float_data = pd.Series([1, 2, None], dtype="float64")

    subtracted_floats = pandas_float_data.subtract(numpy_float_data)

    # Resulting series is Float64 extension dtype
    assert str(subtracted_floats.dtype) == "Float64"

    # The null value from numpy_float_data is np.nan
    np_nan_value = subtracted_floats.iloc[-1]
    assert np.isnan(np_nan_value)
    assert np_nan_value is not pd.NA

    # The null value from pandas_float_data is pd.NA
    pd_na_value = subtracted_floats.iloc[-2]
    assert pd_na_value is pd.NA

    # Only the pd.NA null value is dropped - doesn't recognize np.nan
    dropped_nans = subtracted_floats.dropna()
    assert len(dropped_nans) == 2

Issue Description

There are two problems here 1. Subtracting data with the pandas extension dtypes from data with numpy dtype float64 can result in np.nan being present in the resulting Float64 data if a null value was present in the original float64 data. This is incorrect, as I would expect to only have pd.NA as the null value in a Float64 series 2. Because there is an unexpected null value in the resulting Float64 series, Series.dropna doesn't recognize that null value and will not drop it from the Series.

Note: The behavior is the same whether we subtract the pandas dtype from the numpy dtype or visa versa. IT is also the same if you use a.subtract(b) or a - b.

Expected Behavior

The following should work:

    for extension_dtype in ["Int64", "Float64"]:
        pandas_float_data = pd.Series([1, None, 3], dtype=extension_dtype)
        numpy_float_data = pd.Series([1, 2, None], dtype="float64")

        subtracted_floats = pandas_float_data - numpy_float_data

        # Resulting series is Float64 extension dtype
        assert str(subtracted_floats.dtype) == "Float64"

        # The null value from numpy_float_data is pd.NA
        np_nan_value = subtracted_floats.iloc[-1]
        # assert np.isnan(np_nan_value)
        assert np_nan_value is pd.NA

        # The null value from pandas_float_data is pd.NA
        pd_na_value = subtracted_floats.iloc[-2]
        assert pd_na_value is pd.NA

        # All null values are dropped
        dropped_nans = subtracted_floats.dropna()
        assert len(dropped_nans) == 1

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.8.2.final.0 python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Wed Aug 10 14:25:27 PDT 2022; root:xnu-8020.141.5~2/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.22.4 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 59.8.0 pip : 22.2.2 Cython : 0.29.32 pytest : 7.1.2 hypothesis : None sphinx : 4.5.0 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.5.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : 2022.8.2 gcsfs : None matplotlib : 3.5.3 numba : 0.56.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: phofl

Hi, thanks for your report. This is as expected for now, there is an ongoing discussion about NA and NaN in Float64

Comment From: tamargrey

@phofl I can understand the presence of both NA and nan as being expected, but how is the dropna behavior expected here? Shouldn't it be able to recognize any and all null values and remove them all? I'd be happy to open up a more specific issue if needed.

Is there a better way for me to be removing nans in this situation? If it's converting to "float64" or replacing nan with NA and then dropping, I don't see how it could be the expected behavior.

Comment From: phofl

There are also open issues about that (i think more related to fillna). As I said it’s still an ongoing discussion

Comment From: tamargrey

Got it. I see this has been ongoing for quite a while in https://github.com/pandas-dev/pandas/issues/32265.