Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
x1 = pd.Series([1, 2, None, 4])
x2 = pd.Series([1, 2, np.nan, 4])
x3 = pd.Series([1, 2, pd.NA, 4])
print(x1.to_numpy('int32', na_value=0))
# [1 2 0 4]
print(x2.to_numpy('int32', na_value=0))
# [1 2 0 4]
print(x3.to_numpy('int32', na_value=0))
# Traceback (most recent call last):
# File "<input>", line 14, in <module>
# print(x3.to_numpy('int32', na_value=0))
# File "C:\src\venv\w310\lib\site-packages\pandas\core\base.py", line 535, in to_numpy
# result = np.asarray(self._values, dtype=dtype)
# TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NAType'
print(x3.to_numpy('float64', na_value=0))
# Traceback (most recent call last):
# File "<input>", line 22, in <module>
# print(x3.to_numpy('float64', na_value=0))
# File "C:\src\venv\w310\lib\site-packages\pandas\core\base.py", line 535, in to_numpy
# result = np.asarray(self._values, dtype=dtype)
# TypeError: float() argument must be a string or a real number, not 'NAType'
Issue Description
It appears that a Series that has a missing value that was created using either None
or np.nan
can be replaced by using Series.to_numpy(dtype=, na_value=)
, but one created with pd.NA
fails with a raised exception (both arguments must be specified to trigger the behavior).
Expected Behavior
It is expected that since all three values (None
, np.nan
, and pd.NA
) all represent missing values, that all three should behave the same. For the above reproducible example, the print statements should all report [1 2 0 4]
(or [1. 2. 0. 4.]
for the fourth 'float64' case).
Installed Versions
Comment From: abatomunkuev
Is it because pd.NA
is experimental ? see Experimental new features
The exception raises in line 535. Is it a problem with numpy? https://github.com/pandas-dev/pandas/blob/87cfe4e38bafe7300a6003a1d18bd80f3f77c763/pandas/core/base.py#L535
Comment From: phofl
Could you post this in #48891?
Comment From: MarcoGorelli
Not sure this is the same as https://github.com/pandas-dev/pandas/issues/48891, I think this can already be considered a bug
The following should work:
In [5]: pd.Series([1, 2, pd.NA, 4]).to_numpy('int64', na_value=0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [5], line 1
----> 1 pd.Series([1, 2, pd.NA, 4]).to_numpy('int64', na_value=0)
File ~/pandas-dev/pandas/core/base.py:540, in IndexOpsMixin.to_numpy(self, dtype, copy, na_value, **kwargs)
535 bad_keys = list(kwargs.keys())[0]
536 raise TypeError(
537 f"to_numpy() got an unexpected keyword argument '{bad_keys}'"
538 )
--> 540 result = np.asarray(self._values, dtype=dtype)
541 # TODO(GH-24345): Avoid potential double copy
542 if copy or na_value is not lib.no_default:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'
I think this is more related to https://github.com/pandas-dev/pandas/issues/48864
Comment From: MarcoGorelli
This works if you start with a nullable type:
In [1]: x3 = pd.Series([1, 2, pd.NA, 4], dtype='Int64')
In [2]: print(x3.to_numpy('int64', na_value=1))
[1 2 1 4]
In [3]: x3.to_numpy('int64', na_value=1)
Out[3]: array([1, 2, 1, 4])
The issue is if you start with dtype=object
Looking into this
Comment From: baloe
Oh yes, I was about to open an issue, too, and name it
to_numpy(): na_value ignored when converting object-type pandas data
floattypeseries = pd.Series( [1,2,None], dtype='Float64')
objecttypeseries = floattypeseries.astype('object')
floattypeseries.to_numpy(dtype=float, na_value=np.nan) # → succeeds, 'array([ 1., 2., nan])'
objecttypeseries.to_numpy(dtype=float, na_value=np.nan) # → fails with 'TypeError: float() argument must be a string or a number, not 'NAType''
Looking for a drop-in replacement as a workaround, my first idea was to use
.to_numpy(na_value=np.nan).astype(float)
but this fails for integer-type pandas data (not containing any <NA>
).
This seems to work:
.astype('object').to_numpy(na_value=np.nan).astype(float)