Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
x1 = pd.Series([1, 2, None, 4])
x2 = pd.Series([1, 2, np.nan, 4])
x3 = pd.Series([1, 2, pd.NA, 4])
print(x1.to_numpy('int32', na_value=0))
# [1 2 0 4]
print(x2.to_numpy('int32', na_value=0))
# [1 2 0 4]
print(x3.to_numpy('int32', na_value=0))
# Traceback (most recent call last):
# File "<input>", line 14, in <module>
# print(x3.to_numpy('int32', na_value=0))
# File "C:\src\venv\w310\lib\site-packages\pandas\core\base.py", line 535, in to_numpy
# result = np.asarray(self._values, dtype=dtype)
# TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NAType'
print(x3.to_numpy('float64', na_value=0))
# Traceback (most recent call last):
# File "<input>", line 22, in <module>
# print(x3.to_numpy('float64', na_value=0))
# File "C:\src\venv\w310\lib\site-packages\pandas\core\base.py", line 535, in to_numpy
# result = np.asarray(self._values, dtype=dtype)
# TypeError: float() argument must be a string or a real number, not 'NAType'
Issue Description
It appears that a Series that has a missing value that was created using either None or np.nan can be replaced by using Series.to_numpy(dtype=, na_value=), but one created with pd.NA fails with a raised exception (both arguments must be specified to trigger the behavior).
Expected Behavior
It is expected that since all three values (None, np.nan, and pd.NA) all represent missing values, that all three should behave the same. For the above reproducible example, the print statements should all report [1 2 0 4] (or [1. 2. 0. 4.] for the fourth 'float64' case).
Installed Versions
Comment From: abatomunkuev
Is it because pd.NA is experimental ? see Experimental new features
The exception raises in line 535. Is it a problem with numpy? https://github.com/pandas-dev/pandas/blob/87cfe4e38bafe7300a6003a1d18bd80f3f77c763/pandas/core/base.py#L535
Comment From: phofl
Could you post this in #48891?
Comment From: MarcoGorelli
Not sure this is the same as https://github.com/pandas-dev/pandas/issues/48891, I think this can already be considered a bug
The following should work:
In [5]: pd.Series([1, 2, pd.NA, 4]).to_numpy('int64', na_value=0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [5], line 1
----> 1 pd.Series([1, 2, pd.NA, 4]).to_numpy('int64', na_value=0)
File ~/pandas-dev/pandas/core/base.py:540, in IndexOpsMixin.to_numpy(self, dtype, copy, na_value, **kwargs)
535 bad_keys = list(kwargs.keys())[0]
536 raise TypeError(
537 f"to_numpy() got an unexpected keyword argument '{bad_keys}'"
538 )
--> 540 result = np.asarray(self._values, dtype=dtype)
541 # TODO(GH-24345): Avoid potential double copy
542 if copy or na_value is not lib.no_default:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'
I think this is more related to https://github.com/pandas-dev/pandas/issues/48864
Comment From: MarcoGorelli
This works if you start with a nullable type:
In [1]: x3 = pd.Series([1, 2, pd.NA, 4], dtype='Int64')
In [2]: print(x3.to_numpy('int64', na_value=1))
[1 2 1 4]
In [3]: x3.to_numpy('int64', na_value=1)
Out[3]: array([1, 2, 1, 4])
The issue is if you start with dtype=object
Looking into this
Comment From: baloe
Oh yes, I was about to open an issue, too, and name it
to_numpy(): na_value ignored when converting object-type pandas data
floattypeseries = pd.Series( [1,2,None], dtype='Float64')
objecttypeseries = floattypeseries.astype('object')
floattypeseries.to_numpy(dtype=float, na_value=np.nan) # → succeeds, 'array([ 1., 2., nan])'
objecttypeseries.to_numpy(dtype=float, na_value=np.nan) # → fails with 'TypeError: float() argument must be a string or a number, not 'NAType''
Looking for a drop-in replacement as a workaround, my first idea was to use
.to_numpy(na_value=np.nan).astype(float)
but this fails for integer-type pandas data (not containing any <NA>).
This seems to work:
.astype('object').to_numpy(na_value=np.nan).astype(float)