Pandas BUG: Using df.where converts int to float64

Pretty straightforward bug, excuse me if this has already been adressed.

print(info_df.dtypes)
snp_id            object
rs_id             object
position           int32
a0                object
a1                object
exp_freq_a1      float64
info             float32
certainty        float32
type                int8
info_type0       float32
concord_type0    float32
r2_type0         float32
dtype: object

print(info_df.where(info_df["info"]>0.5).dtypes)
snp_id            object
rs_id             object
position         float64
a0                object
a1                object
exp_freq_a1      float64
info             float32
certainty        float32
type             float64
info_type0       float32
concord_type0    float32
r2_type0         float32
dtype: object

From what I gather it seems to be a known behavior when working with numpy. I guess I'll find a workaround.

EDIT: As expected, fixed with a better practice

print(info.loc[lambda df : df["info"] > 0.5].dtypes)
snp_id            object
rs_id             object
position           int32
a0                object
a1                object
exp_freq_a1      float64
info             float32
certainty        float32
type                int8
info_type0       float32
concord_type0    float32
r2_type0         float32
dtype: object

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-62-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.0 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: None

Comment From: jorisvandenbossche

Please provide a reproducible, runnable example when reporting an issue (eg include some code that constructs the info_df, or a similar dataframe with dummy data that shows the same problem).

In case of info_df.where(info_df["info"]>0.5), this introduces missing values (NaN) in dataframe, and this is a limitation of pandas that it cannot represent missing values in integer columns. Therefore your column gets converted to float. See http://pandas.pydata.org/pandas-docs/stable/gotchas.html#nan-integer-na-values-and-na-type-promotions

Comment From: jreback

closing as known behavior.