Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
s = pd.Series(['a','b']) # dtype: object
s.mask([True, True], 1) # dtype: object but expected to be int64
Issue Description
NDFrame._where
which is used under the hood of .where
and .mask
returns sometimes unexpected dtype.
To illustrate, the final Series
in the example below is expected to have int64
dtype but it remains a object
.
>>> s = pd.Series(['a','b'])
0 a
1 b
dtype: object
>>> s.mask([True, True], 1)
0 1
1 1
dtype: object
This issue came to our attention during the construction of this PR: https://github.com/pandas-dev/pandas/pull/50343 (comment)
Expected Behavior
It is expected for the resulting object to "refresh" its dtype. In the example above, the dtype shall be int64
.
Installed Versions
Comment From: rhshadrach
I disagree, I don't think we should be doing inference here. If you instead did s.mask([True, False], 1)
, then this would need to be object dtype. The dtype of the result should not depend on the values of the mask.
Comment From: phofl
Agreed with @rhshadrach