Pretty straightforward bug, excuse me if this has already been adressed.
print(info_df.dtypes)
snp_id object
rs_id object
position int32
a0 object
a1 object
exp_freq_a1 float64
info float32
certainty float32
type int8
info_type0 float32
concord_type0 float32
r2_type0 float32
dtype: object
print(info_df.where(info_df["info"]>0.5).dtypes)
snp_id object
rs_id object
position float64
a0 object
a1 object
exp_freq_a1 float64
info float32
certainty float32
type float64
info_type0 float32
concord_type0 float32
r2_type0 float32
dtype: object
From what I gather it seems to be a known behavior when working with numpy. I guess I'll find a workaround.
EDIT: As expected, fixed with a better practice
print(info.loc[lambda df : df["info"] > 0.5].dtypes)
snp_id object
rs_id object
position int32
a0 object
a1 object
exp_freq_a1 float64
info float32
certainty float32
type int8
info_type0 float32
concord_type0 float32
r2_type0 float32
dtype: object
Comment From: jorisvandenbossche
Please provide a reproducible, runnable example when reporting an issue (eg include some code that constructs the info_df
, or a similar dataframe with dummy data that shows the same problem).
In case of info_df.where(info_df["info"]>0.5)
, this introduces missing values (NaN) in dataframe, and this is a limitation of pandas that it cannot represent missing values in integer columns. Therefore your column gets converted to float. See http://pandas.pydata.org/pandas-docs/stable/gotchas.html#nan-integer-na-values-and-na-type-promotions
Comment From: jreback
closing as known behavior.