With the extension integer array the below operation errors, with numpy uint not:
In [68]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='UInt64')
...
TypeError: cannot safely cast non-equivalent float64 to uint64
In [69]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='uint64')
Out[69]:
0 0.0
1 -1.0
2 -2.0
dtype: float64
Comment From: hhuuggoo
So the error has changed into
def __init__(self, values, mask, copy=False):
if not (isinstance(values, np.ndarray)
and is_integer_dtype(values.dtype)):
> raise TypeError("values should be integer numpy array. Use "
"the 'integer_array' function instead")
E TypeError: values should be integer numpy array. Use the 'integer_array' function instead
But that is because of what happens in _maybe_mask_result.
https://github.com/pandas-dev/pandas/blob/0480f4c183a95712cb8ceaf5682c5b8dd02e0f21/pandas/core/arrays/integer.py#L532
That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?
If we don't want to rely on the dtype of result, then we would have to add operations between uint64 and int64 to the list of cases where we get floats back from numpy
In [10]: (np.array([1], dtype='uint64') - np.array([1], dtype='int64')).dtype
Out[10]: dtype('float64')
In [11]: (np.array([1], dtype='uint32') - np.array([1], dtype='int64')).dtype
Out[11]: dtype('int64')
Comment From: jorisvandenbossche
That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?
I can't directly think of cases where numpy will return float, but where we want to convert again to an integer dtype. But since it is written that way, there might be cases (@jreback do you remember?) You can maybe do the change and see if there are tests failing then
Comment From: hhuuggoo
if I just change it to return the float array (apply nan mask) whenever the result dtype is a float, all tests pass.
https://github.com/hhuuggoo/pandas/tree/remove_float_logic
https://github.com/hhuuggoo/pandas/blob/remove_float_logic/pandas/core/arrays/integer.py#L532
seems ok - but I don't completely understand what was there to begin with
Comment From: dsaxton
Apparently the bug also exists for addition:
[ins] In [1]: import pandas as pd
[ins] In [2]: left = pd.Series([1, 1, 1])
[ins] In [3]: right = pd.Series([1, 2, 3], dtype="UInt64")
[ins] In [4]: left + right
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-ded06e34d5c6> in <module>
----> 1 left + right
~/pandas/pandas/core/ops/common.py in new_method(self, other)
63 other = item_from_zerodim(other)
64
---> 65 return method(self, other)
66
67 return new_method
~/pandas/pandas/core/ops/__init__.py in wrapper(left, right)
341 lvalues = extract_array(left, extract_numpy=True)
342 rvalues = extract_array(right, extract_numpy=True)
--> 343 result = arithmetic_op(lvalues, rvalues, op)
344
345 return left._construct_result(result, name=res_name)
~/pandas/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op)
184 if should_extension_dispatch(lvalues, rvalues) or isinstance(rvalues, Timedelta):
185 # Timedelta is included because numexpr will fail on it, see GH#31457
--> 186 res_values = op(lvalues, rvalues)
187
188 else:
~/pandas/pandas/core/arrays/integer.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
400
401 # for binary ops, use our custom dunder methods
--> 402 result = ops.maybe_dispatch_ufunc_to_dunder_op(
403 self, ufunc, method, *inputs, **kwargs
404 )
~/pandas/pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()
89 else:
90 name = REVERSED_NAMES.get(op_name, f"__r{op_name}__")
---> 91 result = getattr(self, name, not_implemented)(inputs[0])
92 return result
93 else:
~/pandas/pandas/core/ops/common.py in new_method(self, other)
63 other = item_from_zerodim(other)
64
---> 65 return method(self, other)
66
67 return new_method
~/pandas/pandas/core/arrays/integer.py in integer_arithmetic_method(self, other)
659 )
660
--> 661 return self._maybe_mask_result(result, mask, other, op_name)
662
663 name = f"__{op.__name__}__"
~/pandas/pandas/core/arrays/integer.py in _maybe_mask_result(self, result, mask, other, op_name)
588 return result
589
--> 590 return type(self)(result, mask, copy=False)
591
592 @classmethod
~/pandas/pandas/core/arrays/integer.py in __init__(self, values, mask, copy)
359 def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
360 if not (isinstance(values, np.ndarray) and values.dtype.kind in ["i", "u"]):
--> 361 raise TypeError(
362 "values should be integer numpy array. Use "
363 "the 'pd.array' function instead"
TypeError: values should be integer numpy array. Use the 'pd.array' function instead
[ins] In [5]: pd.__version__
Out[5]: '1.2.0.dev0+520.gdca6c7f43'
Comment From: phofl
This works now, but may need some tests. We get Float64