Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.__version__ # 2.0.0
df1 = pd.DataFrame(
{
"n": [11, 13, 17],
"answers": ["A", "A", "B"],
}
)
df1 = df1.pivot(columns="answers", values="n")
df1.loc[:, ["A", "B"]] = df1.loc[:, ["A", "B"]].fillna(0).astype("int")
# df1
# answers A B
# 0 11.0 0.0
# 1 13.0 0.0
# 2 0.0 17.0
df2 = pd.DataFrame({
"A": [11.0, 13.0, pd.NA],
"B": [pd.NA, pd.NA, 17.0]
})
df2.loc[:, ["A", "B"]] = df2.loc[:, ["A", "B"]].fillna(0).astype("int")
# df2
# A B
# 0 11 0
# 1 13 0
# 2 0 17
Issue Description
The expression df1.loc[:, ["A", "B"]].fillna(0).astype("int") gives the expected data with NAs filled with 0 and the dtype changed to int.
But assigning that back to df1.loc[:, ["A", "B"]] does not keep the dtype change.
I have also verified that copying df2's column into df1's columns before the fillna and astype does not change the outcome.
This is not reproducible in a dataframe created from the pivoted data (df2).
NB.
This MRE is a much simplified version of the indexing/assignment I use in the code in which I found the issue, which uses pd.IndexSlice and hierarchical columns.
I am aware df1[["A", "B"]] = df1[["A", "B"]].fillna(0).astype("int") works but using .loc is necessary with pd.IndexSlice.
I am currently using a workaround similar to the above by creating an exhaustive pd.MultiIndex and using it in place of ["A", "B"] rather than using a .loc + pd.IndexSlice, happy to work on a hierarchical MRE if that's useful or necessary.
Expected Behavior
The pivoted df1 should behave like df2 and astype not be lost upon assignment.
Installed Versions
Comment From: ryanyao2002
take
Comment From: phofl
Hey, thanks for your report. This is a duplicate of #52593