Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame(data=[[None, None], [None,None]], columns=['A', 'B'])
df.fillna({'A': 1, 'B': 2}, inplace=True)
Issue Description
With the most recent development version 1.5.0rc0, I see that the code example raises the following warning:
FutureWarning: In a future version,
df.iloc[:, i] = newvalswill attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either
df[df.columns[i]] = newvalsor, if columns are non-unique,
df.isetitem(i, newvals)``
The df. fillna() API should not use code that is ambiguous or deprecated.
Expected Behavior
df.fillna() should do its job, silently.
Installed Versions
Comment From: phofl
Hi, thanks for your report. This is fixed on main and also on 1.5.x that will be release soonish, see #48254 This might need a test though
Comment From: RaphSku
take
Comment From: marekro
Did this fix make it into 1.5.0? It seems to me that recently released 1.5.0 still shows this FutureWarning.
Comment From: phofl
I can not see the warning on 1.5.0. Can you provide your environment information?
Comment From: marekro
It is a mystery... when using pandas-1.5.0 on MacOS and Linux, plain, I do not see the warning either. In context of my greater program, the FutureWarning keeps appearing - I need to debug further. Maybe some issue with venv, will comment here again with new findings.
Comment From: marekro
For completeness, below are the installed versions, printed from within the same run that also emits the FutureWarnings. However, I noticed that because of a mistake, I have been importing pandas twice: import pandas; ...; import pandas as pd. Maybe that causes issues? I have fixed this, running now regressions, and will follow up.
INSTALLED VERSIONS
commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763 python : 3.8.11.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1062.1.2.el7.x86_64 Version : #1 SMP Mon Sep 30 14:19:46 UTC 2019 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.5.0 numpy : 1.23.3 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 56.0.0 pip : 22.2.2 Cython : 0.29.23 pytest : 7.1.3 hypothesis : None sphinx : 1.8.6 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : 1.4.41 tables : None tabulate : 0.8.10 xarray : None xlrd : None xlwt : None zstandard : None tzdata : None
Comment From: marekro
This is odd; I am running the corrected code, and I am still getting:
FutureWarning: In a future version,
df.iloc[:, i] = newvals
will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use eitherdf[df.columns[i]] = newvals
or, if columns are non-unique,df.isetitem(i, newvals)
The code emitting the warning is this statement:
result_df.fillna({
'key_vr_folder_id': '',
'key_relationship': '',
'key_parenttasks': '',
'key_subtasks': ''
}, inplace=True)
Again, below is the corresponding version list from pandas. I cannot reproduce the FutureWarning with pandas-1.5.0 outside of the program. The warning seems to come out of _setitem_single_column
, but the conditionals in there do not ring a bell with me. Any ideas?
INSTALLED VERSIONS
commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763 python : 3.8.11.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1062.1.2.el7.x86_64 Version : #1 SMP Mon Sep 30 14:19:46 UTC 2019 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.5.0 numpy : 1.23.3 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 56.0.0 pip : 22.2.2 Cython : 0.29.23 pytest : 7.1.3 hypothesis : None sphinx : 1.8.6 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : 1.4.41 tables : None tabulate : 0.8.10 xarray : None xlrd : None xlwt : None zstandard : None tzdata : None
Comment From: phofl
As long as there isn't anything we can debug, there is not much we can do
Comment From: marekro
Found the corner case - a dataframe with no rows:
df = pd.DataFrame(data=[], columns=['A', 'B'])
df.fillna({'A': '', 'B': ''}, inplace=True)
<input>:1: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
<input>:1: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
Seems that this situation needs to be handled specifically. Sorry for the noise in the previous comments.
Comment From: olsgaard
I'm having a similar problem where I apply a function to a grouped dataframe that uses df.loc[i:ii, "colname"]
and for some of the groups, I get the warning.
Is there a way to break into pdb when the warning is elicited? I'm not sure how to go about debugging this.
Comment From: abedhammoud
I am having the same warning issued from the following statement:
df.fillna(value={
'ITEM': '<NA>',
'TTYPE': '-1',
'QTY': 0,
'P': -1,
'COUNTRY': '<NA>',
'CITY': '<NA>'
}, inplace=True)
When I remove the inplace and instead use:
df = df.fillna(...)
The warning disappears.
I am running 1.5.1 with python 3.10.6 on windows.
Comment From: phofl
Can you provide something reproducible?
Comment From: abedhammoud
Even though I am getting the warning in my program every time, I cannot write something that reproduces the warning. My code is doing something like the following (where df has many more columns and rows):
import numpy as np
import pandas as pd
df = pd.DataFrame(
{
'ITEM': ['ITM1', 'ITM2', pd.NA, np.nan],
'REF': ['R1', 'R2', 'R3', pd.NA],
'TYPE': -1,
'QTY': 0,
'P': [0, 1, 'Y', 'X'],
}
)
print(f'before:\n{df=}')
df['P'] = pd.to_numeric(df['P'], errors='coerce')
print(f'after to_numeric:\n{df=}')
df.fillna(value={
'ITEM': '<NA>',
'IREF': '<NA>',
'P': -1}, inplace=True)
print(f'tidy:\n{df=}')
I will keep trying and report back.
Comment From: caoyu-yiyue
I'm using pandas 1.5.2 on MacOS 13.0.1, with Python@3.11.0 from Homebrew, and I can consistently reproduce the bug with the code below:
import pandas as pd
df = pd.DataFrame(data={'col1': ['a', pd.NA, 'c']}, index=range(3))
df = df.convert_dtypes()
df.fillna({'col1': '<NA>'}, inplace=True)
Three conditions together for the bug:
'col1'
isobject
dtype at first, not int or float.- Call the
convert_dtypes()
method. - Set
inplace=True
infillna()
.
Maybe some problem associated with string dtype?
And I also investigate other circumstances.
- Using
np.nan
instead ofpd.NA
the problem will still be there. -
Create a Series using the same list, and using
fillna()
method of Series will not trigger the bug. Codes below:```python
No bug for Series.
s = pd.Series(['a', pd.NA, 'b']) s = s.convert_dtypes() s.fillna('
', inplace=True) ```
Maybe some help.
Comment From: abedhammoud
I confirm that also in my case (see above) if I use df.convert_dtypes() I can reproduce the warning. However, in my original code (which I tried to summarize), I am not using convert_dtypes() explicitly, and I am getting the warning.
import numpy as np
import pandas as pd
df = pd.DataFrame(
{
'ITEM': ['ITM1', 'ITM2', pd.NA, np.nan],
'REF': ['R1', 'R2', 'R3', pd.NA],
'TYPE': -1,
'QTY': 0,
'P': [0, 1, 'Y', 'X'],
}
)
print(f'before:\n{df=}')
# This will reproduce the warning
df = df.convert_dtypes()
df['P'] = pd.to_numeric(df['P'], errors='coerce')
print(f'after to_numeric:\n{df=}')
df.fillna(value={
'ITEM': '<NA>',
'IREF': '<NA>',
'P': -1}, inplace=True)
print(f'tidy:\n{df=}')
Comment From: MarcoGorelli
Hey @marekro
Thanks for having reported this - there's now a release candidate for pandas 2.0.0, which you can install with
mamba install -c conda-forge/label/pandas_rc pandas==2.0.0rc0
or
python -m pip install --upgrade --pre pandas==2.0.0rc0
If you try it out and report bugs, then we can fix them before the final 2.0.0 release
Comment From: abedhammoud
@MarcoGorelli the above warning is not happening with 2.0.0rc0. However, I am seeing other issues that I will report separately. Thank you.
Comment From: abedhammoud
I confirm that I am not seeing the above warning after many tests. And, once I took care of some of the warnings related to some depreciated code, all worked perfectly with 2.0.0rc0. Thank you.
Comment From: MarcoGorelli
thanks for checking!