Pandas read_csv: Casting boolean columns as floats turns missing values into 1.0

Code Sample, a copy-pastable example if possible

In pandas v0.20.2, the following code

import pandas as pd
from io import StringIO
data = "c1,c2\nfalse,1\n,1"
pd.read_csv(StringIO(data), dtype={'c1': 'float32'})['c1']

gives output

0    0.0
1    1.0
Name: c1, dtype: float32

Problem description

In this example, the column of boolean data contains a missing value. If I read the column as booleans (either explicitly via dtype or by allowing pandas to infer the type), then the missing value is given as NaN, as it should be. If I force the column type to be a float (or an integer) via the dtype argument to read_csv, then the missing value is given as 1.0, the same as True.

Expected Output

The output of

pd.read_csv(StringIO(data), dtype={'c1': 'float32'})['c1']

should be the same as the output of

pd.read_csv(StringIO(data))['c1'].astype('float32')

which is

0    0.0
1    NaN
Name: c1, dtype: float32

I.e., the missing value in the input CSV should be cast to NaN rather than 1.0.

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.2 pytest: 3.0.7 pip: 9.0.1 setuptools: 33.1.1.post20170320 Cython: 0.25.2 numpy: 1.13.0 scipy: 0.19.0 xarray: None IPython: 6.0.0 sphinx: 1.5.5 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: 0.999 sqlalchemy: None pymysql: None psycopg2: 2.6.2 (dt dec pq3 ext lo64) jinja2: 2.8 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

yeah this looks like a bug. welcome to have a PR to fix.

Comment From: phofl

This was fixed in #44901, tests cover this

Pandas read_csv: Casting boolean columns as floats turns missing values into 1.0

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`