Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.__version__ # 2.0.0
df1 = pd.DataFrame(
{
"n": [11, 13, 17],
"answers": ["A", "A", "B"],
}
)
df1 = df1.pivot(columns="answers", values="n")
df1.loc[:, ["A", "B"]] = df1.loc[:, ["A", "B"]].fillna(0).astype("int")
# df1
# answers A B
# 0 11.0 0.0
# 1 13.0 0.0
# 2 0.0 17.0
df2 = pd.DataFrame({
"A": [11.0, 13.0, pd.NA],
"B": [pd.NA, pd.NA, 17.0]
})
df2.loc[:, ["A", "B"]] = df2.loc[:, ["A", "B"]].fillna(0).astype("int")
# df2
# A B
# 0 11 0
# 1 13 0
# 2 0 17
Issue Description
The expression df1.loc[:, ["A", "B"]].fillna(0).astype("int")
gives the expected data with NA
s filled with 0
and the dtype changed to int
.
But assigning that back to df1.loc[:, ["A", "B"]]
does not keep the dtype change.
I have also verified that copying df2
's column into df1
's columns before the fillna
and astype
does not change the outcome.
This is not reproducible in a dataframe created from the pivoted data (df2
).
NB.
This MRE is a much simplified version of the indexing/assignment I use in the code in which I found the issue, which uses pd.IndexSlice
and hierarchical columns.
I am aware df1[["A", "B"]] = df1[["A", "B"]].fillna(0).astype("int")
works but using .loc
is necessary with pd.IndexSlice
.
I am currently using a workaround similar to the above by creating an exhaustive pd.MultiIndex
and using it in place of ["A", "B"]
rather than using a .loc
+ pd.IndexSlice
, happy to work on a hierarchical MRE if that's useful or necessary.
Expected Behavior
The pivoted df1
should behave like df2
and astype
not be lost upon assignment.
Installed Versions
Comment From: ryanyao2002
take
Comment From: phofl
Hey, thanks for your report. This is a duplicate of #52593