Pandas Substraction with UInt64 series resulting in negative values gives TypeError

With the extension integer array the below operation errors, with numpy uint not:

In [68]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='UInt64')
...
TypeError: cannot safely cast non-equivalent float64 to uint64

In [69]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='uint64')
Out[69]: 
0    0.0
1   -1.0
2   -2.0
dtype: float64

Comment From: hhuuggoo

So the error has changed into

    def __init__(self, values, mask, copy=False):                                                                                                                                                                                                                               
        if not (isinstance(values, np.ndarray)                                                                                                                                                                                                                                  
                and is_integer_dtype(values.dtype)):                                                                                                                                                                                                                            
>           raise TypeError("values should be integer numpy array. Use "                                                                                                                                                                                                        
                            "the 'integer_array' function instead")                                                                                                                                                                                                             
E           TypeError: values should be integer numpy array. Use the 'integer_array' function instead

But that is because of what happens in _maybe_mask_result.

https://github.com/pandas-dev/pandas/blob/0480f4c183a95712cb8ceaf5682c5b8dd02e0f21/pandas/core/arrays/integer.py#L532

That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?

If we don't want to rely on the dtype of result, then we would have to add operations between uint64 and int64 to the list of cases where we get floats back from numpy

In [10]: (np.array([1], dtype='uint64') - np.array([1], dtype='int64')).dtype                                                                                                                                                                                                   
Out[10]: dtype('float64')                                                                                                                                                                                                                                                       

In [11]: (np.array([1], dtype='uint32') - np.array([1], dtype='int64')).dtype                                                                                                                                                                                                   
Out[11]: dtype('int64')

Comment From: jorisvandenbossche

That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?

I can't directly think of cases where numpy will return float, but where we want to convert again to an integer dtype. But since it is written that way, there might be cases (@jreback do you remember?) You can maybe do the change and see if there are tests failing then

Comment From: hhuuggoo

if I just change it to return the float array (apply nan mask) whenever the result dtype is a float, all tests pass.

https://github.com/hhuuggoo/pandas/tree/remove_float_logic

https://github.com/hhuuggoo/pandas/blob/remove_float_logic/pandas/core/arrays/integer.py#L532

seems ok - but I don't completely understand what was there to begin with

Comment From: dsaxton

Apparently the bug also exists for addition:

[ins] In [1]: import pandas as pd

[ins] In [2]: left = pd.Series([1, 1, 1])

[ins] In [3]: right = pd.Series([1, 2, 3], dtype="UInt64")

[ins] In [4]: left + right
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-ded06e34d5c6> in <module>
----> 1 left + right

~/pandas/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/pandas/pandas/core/ops/__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

~/pandas/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op)
    184     if should_extension_dispatch(lvalues, rvalues) or isinstance(rvalues, Timedelta):
    185         # Timedelta is included because numexpr will fail on it, see GH#31457
--> 186         res_values = op(lvalues, rvalues)
    187 
    188     else:

~/pandas/pandas/core/arrays/integer.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    400 
    401         # for binary ops, use our custom dunder methods
--> 402         result = ops.maybe_dispatch_ufunc_to_dunder_op(
    403             self, ufunc, method, *inputs, **kwargs
    404         )

~/pandas/pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()
     89         else:
     90             name = REVERSED_NAMES.get(op_name, f"__r{op_name}__")
---> 91             result = getattr(self, name, not_implemented)(inputs[0])
     92             return result
     93     else:

~/pandas/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/pandas/pandas/core/arrays/integer.py in integer_arithmetic_method(self, other)
    659                 )
    660 
--> 661             return self._maybe_mask_result(result, mask, other, op_name)
    662 
    663         name = f"__{op.__name__}__"

~/pandas/pandas/core/arrays/integer.py in _maybe_mask_result(self, result, mask, other, op_name)
    588             return result
    589 
--> 590         return type(self)(result, mask, copy=False)
    591 
    592     @classmethod

~/pandas/pandas/core/arrays/integer.py in __init__(self, values, mask, copy)
    359     def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
    360         if not (isinstance(values, np.ndarray) and values.dtype.kind in ["i", "u"]):
--> 361             raise TypeError(
    362                 "values should be integer numpy array. Use "
    363                 "the 'pd.array' function instead"

TypeError: values should be integer numpy array. Use the 'pd.array' function instead

[ins] In [5]: pd.__version__
Out[5]: '1.2.0.dev0+520.gdca6c7f43'

Comment From: phofl

This works now, but may need some tests. We get Float64