Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# classic, numpy example
ser = pd.Series([0.0, 1.0, 2.0, 3.0])
other = pd.Series([True, False, True, False])
ser.mask(~ser.isna(), other).dtype # bool
# extension array example
ser = pd.Series([0.0, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser.mask(~ser.isna(), other).dtype # TypeError: object cannot be converted to FloatingDtype
Issue Description
Hi,
When using mask
, in which other
completely replaces self, the classic numpy approach works as expected (with resulting object of type bool
). In the extension array example however, a TypeError
is thrown.
A more complicated case is one in which other
does not completely replace self, for example
ser = pd.Series([pd.NA, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser.mask(~ser.isna(), other)
Here in the numpy case, the result is downcast to object
, but in the extension array case the same TypeError
is thrown. I can see how this could be a little trickier, but this could also downcast to object
or (in this example) still hold the pd.NA
and return a result of pd.BooleanDtype()
Expected Behavior
ser = pd.Series([0.0, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser.mask(~ser.isna(), other) # would expect to return pd.Series([True, False, True, False], dtype=pd.BooleanDtype()), not a TypeError
Installed Versions
Comment From: phofl
Hi, thanks for your report. this works on main, but keeps float dtype, which is what I would expect here. But probably needs tests
Comment From: ShashwatAgrawal20
hey! @phofl I would like to work on this but don't know where to get started, it would be a great help if you can guide me from where can I get started
Comment From: ShashwatAgrawal20
take
Comment From: ShashwatAgrawal20
ser = pd.Series([pd.NA, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser.mask(~ser.isna(), other)
The above might have shown an error as the mask
method expects a series or scalar values with the same datatype as the original series for other parameters.
Might prevent this error by casting the other or the ser Series to the same as the original Series
ser = pd.Series([0.0, 1.0, 2.0, 3.0], dtype=pd.Float64Dtype())
other = pd.Series([True, False, True, False], dtype=pd.BooleanDtype())
ser = ser.astype(other.dtype)
result = ser.mask(~ser.isna(), other)
print(result)
Output
Comment From: luke396
take