Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df=pd.DataFrame({'VAR_NAME': {65: 'FIN4_0020', 66: 'FIN1_0021', 67: 'FIN3_0021', 68: 'FIN4_0021', 69: 'FIN1_0022', 70: 'FIN3_0022', 71: 'FIN4_0022', 72: 'FIN1_0023', 73: 'FIN3_0023', 74: 'FIN4_0023', 75: 'FIN1_0024'}, 'LYM1': {65: 1, 66: 1, 67: 1, 68: 1, 69: 1, 70: 1, 71: 1, 72: 1, 73: 1, 74: 1, 75: 1}, 'LYM2': {65: 1, 66: 1, 67: 1, 68: 1, 69: 1, 70: 1, 71: 1, 72: 1, 73: 1, 74: 1, 75: 1}, 'LYM3': {65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0}, 'LYM4': {65: 2, 66: 1, 67: 'T', 68: 2, 69: 1, 70: 'T', 71: 2, 72: 1, 73: 'T', 74: 2, 75: 1}})
Issue Description
I can't explain the difference between the 2 ways of casting int using the map function. I just changed the order of using map function in code below and the result is different. The problem occurs when I read directly from the file as in the example below..
Expected Behavior
>>> df = pd.read_excel(excel_file, sheet_name=sheet_name)
>>> print(df[['VAR_NAME','LYM1','LYM2','LYM3','LYM4']].map(lambda x: int(x) if isinstance(x,float) else x,na_action='ignore').query('VAR_NAME=="FIN3_0022"'))
>>> print(df.query('VAR_NAME=="FIN3_0022"')[['VAR_NAME','LYM1','LYM2','LYM3','LYM4']].map(lambda x: int(x) if isinstance(x,float) else x,na_action='ignore'))
VAR_NAME LYM1 LYM2 LYM3 LYM4
70 FIN3_0022 1 1 1.0 T
VAR_NAME LYM1 LYM2 LYM3 LYM4
70 FIN3_0022 1 1 1 T
>>> df=pd.DataFrame({'VAR_NAME': {65: 'FIN4_0020', 66: 'FIN1_0021', 67: 'FIN3_0021', 68: 'FIN4_0021', 69: 'FIN1_0022', 70: 'FIN3_0022', 71: 'FIN4_0022', 72: 'FIN1_0023', 73: 'FIN3_0023', 74: 'FIN4_0023', 75: 'FIN1_0024'}, 'LYM1': {65: 1, 66: 1, 67: 1, 68: 1, 69: 1, 70: 1, 71: 1, 72: 1, 73: 1, 74: 1, 75: 1}, 'LYM2': {65: 1, 66: 1, 67: 1, 68: 1, 69: 1, 70: 1, 71: 1, 72: 1, 73: 1, 74: 1, 75: 1}, 'LYM3': {65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0}, 'LYM4': {65: 2, 66: 1, 67: 'T', 68: 2, 69: 1, 70: 'T', 71: 2, 72: 1, 73: 'T', 74: 2, 75: 1}})
>>> df
VAR_NAME LYM1 LYM2 LYM3 LYM4
65 FIN4_0020 1 1 1.0 2
66 FIN1_0021 1 1 1.0 1
67 FIN3_0021 1 1 1.0 T
68 FIN4_0021 1 1 1.0 2
69 FIN1_0022 1 1 1.0 1
70 FIN3_0022 1 1 1.0 T
71 FIN4_0022 1 1 1.0 2
72 FIN1_0023 1 1 1.0 1
73 FIN3_0023 1 1 1.0 T
74 FIN4_0023 1 1 1.0 2
75 FIN1_0024 1 1 1.0 1
>>> print(df[['VAR_NAME','LYM1','LYM2','LYM3','LYM4']].map(lambda x: int(x) if isinstance(x,float) else x,na_action='ignore').query('VAR_NAME=="FIN3_0022"'))
VAR_NAME LYM1 LYM2 LYM3 LYM4
70 FIN3_0022 1 1 1 T
>>> print(df.query('VAR_NAME=="FIN3_0022"')[['VAR_NAME','LYM1','LYM2','LYM3','LYM4']].map(lambda x: int(x) if isinstance(x,float) else x,na_action='ignore'))
VAR_NAME LYM1 LYM2 LYM3 LYM4
70 FIN3_0022 1 1 1 T
Installed Versions
INSTALLED VERSIONS
------------------
commit : 0691c5cf90477d3503834d983f69350f250a6ff7
python : 3.10.11
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.26100
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
pip : 24.0
Cython : None
sphinx : None
IPython : 8.24.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.4
lxml.etree : 5.2.2
matplotlib : 3.9.0
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.4
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 19.0.0
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : 2.0.38
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Comment From: preet545
Take