-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [1, 2, np.nan], "B": [4, np.nan, 8]}, dtype="Int64")
df.fillna('nan')
Issue Description
The code above causes the following error:
TypeError Traceback (most recent call last)
<ipython-input-2-fcfc27adad85> in <module>
2 import numpy as np
3 df = pd.DataFrame({"A": [1, 2, np.nan], "B": [4, np.nan, 8]}, dtype="Int64")
----> 4 df.fillna('nan')
~/venv/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
~/venv/lib/python3.8/site-packages/pandas/core/frame.py in fillna(self, value, method, axis, inplace, limit, downcast)
5174 downcast=None,
5175 ) -> DataFrame | None:
-> 5176 return super().fillna(
5177 value=value,
5178 method=method,
~/venv/lib/python3.8/site-packages/pandas/core/generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
6380
6381 elif not is_list_like(value):
-> 6382 new_data = self._mgr.fillna(
6383 value=value, limit=limit, inplace=inplace, downcast=downcast
6384 )
~/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py in fillna(self, value, limit, inplace, downcast)
408
409 def fillna(self: T, value, limit, inplace: bool, downcast) -> T:
--> 410 return self.apply(
411 "fillna", value=value, limit=limit, inplace=inplace, downcast=downcast
412 )
~/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
325 applied = b.apply(f, **kwargs)
326 else:
--> 327 applied = getattr(b, f)(**kwargs)
328 except (TypeError, NotImplementedError):
329 if not ignore_failures:
~/venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py in fillna(self, value, limit, inplace, downcast)
1570 self, value, limit=None, inplace: bool = False, downcast=None
1571 ) -> list[Block]:
-> 1572 values = self.values.fillna(value=value, limit=limit)
1573 return [self.make_block_same_class(values=values)]
1574
~/venv/lib/python3.8/site-packages/pandas/core/arrays/masked.py in fillna(self, value, method, limit)
174 # fill with value
175 new_values = self.copy()
--> 176 new_values[mask] = value
177 else:
178 new_values = self.copy()
~/venv/lib/python3.8/site-packages/pandas/core/arrays/masked.py in __setitem__(self, key, value)
186 if _is_scalar:
187 value = [value]
--> 188 value, mask = self._coerce_to_array(value)
189
190 if _is_scalar:
~/venv/lib/python3.8/site-packages/pandas/core/arrays/integer.py in _coerce_to_array(self, value)
332
333 def _coerce_to_array(self, value) -> tuple[np.ndarray, np.ndarray]:
--> 334 return coerce_to_array(value, dtype=self.dtype)
335
336 def astype(self, dtype, copy: bool = True) -> ArrayLike:
~/venv/lib/python3.8/site-packages/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
202
203 elif not (is_integer_dtype(values) or is_float_dtype(values)):
--> 204 raise TypeError(f"{values.dtype} cannot be converted to an IntegerDtype")
205
206 if mask is None:
TypeError: <U3 cannot be converted to an IntegerDtype
Expected Behavior
df.fillna('nan') should return a new dataframe with np.nan
values filled with 'nan'
strings.
Installed Versions
Comment From: gabrieldi95
On master I get the error ValueError: invalid literal for int() with base 10: 'nan'
, but is this really a bug? I guess it is expected, if you set dtype as Int64 you shouldn't fillna with a string.
Comment From: the21st
On master I get the error
ValueError: invalid literal for int() with base 10: 'nan'
, but is this really a bug? I guess it is expected, if you set dtype as Int64 you shouldn't fillna with a string.
I would expect pandas to be consistent across types, and Int64 is not consistent with the case when you don't specify a dtype:
df = pd.DataFrame({"A": [1, 2, np.nan], "B": [4, np.nan, 8]}) # column A is of type float64
df.fillna('nan') # column A is of type object, it contains both floats and strings
Comment From: phofl
This is similar to #45729 and related discussions. Closing here to keep discussion together. As of right now this is expected