xref #38746
The following runs fine:
In [30]: s = pd.Series([np.iinfo(np.uint64).max-1, 1], dtype="uint64")
In [31]: s
Out[31]:
0 18446744073709551614
1 1
dtype: uint64
but with the equivalent extension dtype the constructor throws:
In [29]: s = pd.Series([np.iinfo(np.uint64).max-1, 1], dtype="UInt64")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/repos/pandas/pandas/core/arrays/integer.py in safe_cast(values, dtype, copy)
100 try:
--> 101 return values.astype(dtype, casting="safe", copy=copy)
102 except TypeError as err:
TypeError: Cannot cast array data from dtype('float64') to dtype('uint64') according to the rule 'safe'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-29-7fde9ec5cff9> in <module>
----> 1 s = pd.Series([np.iinfo(np.uint64).max-1, 1], dtype="UInt64")
~/repos/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
339 data = data.copy()
340 else:
--> 341 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
342
343 data = SingleBlockManager.from_array(data, index)
~/repos/pandas/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
485
486 if dtype is not None:
--> 487 subarr = _try_cast(data, dtype, copy, raise_cast_failure)
488 else:
489 subarr = maybe_convert_platform(data)
~/repos/pandas/pandas/core/construction.py in _try_cast(arr, dtype, copy, raise_cast_failure)
594 # SparseDtype does not
595 array_type = dtype.construct_array_type()._from_sequence
--> 596 subarr = array_type(arr, dtype=dtype, copy=copy)
597 return subarr
598
~/repos/pandas/pandas/core/arrays/integer.py in _from_sequence(cls, scalars, dtype, copy)
307 cls, scalars, *, dtype=None, copy: bool = False
308 ) -> "IntegerArray":
--> 309 values, mask = coerce_to_array(scalars, dtype=dtype, copy=copy)
310 return IntegerArray(values, mask)
311
~/repos/pandas/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
204 values = safe_cast(values, dtype, copy=False)
205 else:
--> 206 values = safe_cast(values, dtype, copy=False)
207
208 return values, mask
~/repos/pandas/pandas/core/arrays/integer.py in safe_cast(values, dtype, copy)
106 return casted
107
--> 108 raise TypeError(
109 f"cannot safely cast non-equivalent {values.dtype} to {np.dtype(dtype)}"
110 ) from err
TypeError: cannot safely cast non-equivalent float64 to uint64
Comment From: jorisvandenbossche
This is related to https://github.com/pandas-dev/pandas/issues/30268, in the sense that it has the same cause: we let numpy do the inference on the list of values first, and in this case of large integers, numpy apparently converts it to a float array (loosing precision) instead of a uint64 array. So I think the solution will probably be similar. There was a PR https://github.com/pandas-dev/pandas/pull/30282, but was blocked by other issues.
Comment From: arw2019
This is related to https://github.com/pandas-dev/pandas/issues/30268, in the sense that it has the same cause: we let numpy do the inference on the list of values first, and in this case of large integers, numpy apparently converts it to a float array (loosing precision) instead of a uint64 array.
So I think the solution will probably be similar. There was a PR https://github.com/pandas-dev/pandas/pull/30282, but was blocked by other issues.
Thanks for linking these! I'll look into resuscitating 30282, probably by resolving the blocker first
Comment From: jorisvandenbossche
I need to look back at the original PR, but I think the blocker is at the moment mainly https://github.com/pandas-dev/pandas/pull/38315
Comment From: arw2019
I need to look back at the original PR, but I think the blocker is at the moment mainly https://github.com/pandas-dev/pandas/pull/38315
I think that's right
Comment From: fangchenli
This test has passed on arm64 #38905