Pandas Dataframe constructor does not coerce data=[None] to np.nan

A small, complete example of the issue

As called out in #14381, if dtype is not specified, values of None should be coerced to np.nan. However, when a list of only None is passed to data, the None remains.

>>> pd.DataFrame([None])
      0
0  None
>>> pd.DataFrame([None, None])
      0
0  None
1  None

Expected Output

>>> pd.DataFrame([None])
      0
0  NaN
>>> pd.DataFrame([None, None])
      0
0  NaN
1  NaN

Output of `pd.show_versions()`

## INSTALLED VERSIONS commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 3.16.0-77-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None pandas: 0.19.0 nose: 1.3.4 pip: 8.1.2 setuptools: 5.8 Cython: 0.21 numpy: 1.11.2 scipy: 0.16.1 statsmodels: 0.6.1 xarray: None IPython: 4.0.0 sphinx: 1.2.3 patsy: 0.3.0 dateutil: 2.5.3 pytz: 2016.7 blosc: None bottleneck: None tables: 3.1.1 numexpr: 2.3.1 matplotlib: 1.5.0 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: 0.7.5 xlsxwriter: 0.5.7 lxml: 3.4.0 bs4: 4.3.2 html5lib: None httplib2: 0.9.1 apiclient: None sqlalchemy: 0.9.7 pymysql: None psycopg2: 2.6.1 (dt dec pq3 ext lo64) jinja2: 2.7.3 boto: 2.32.1 pandas_datareader: None

Comment From: jreback

IIRC this was actually special cased to allow this to be object dtype, with the passed None' s remaining. (further, Series([None]) is another case).

We don't coerce provided None's in a constructor for object dtype, e.g.

In [3]: Series(['a', None])
Out[3]: 
0       a
1    None
dtype: object

So I agree that we should be consistent, but its difficult to know when to leave an actual None.

I last changed this here, though I recall this being the case for quite some time. We like to preserve an inferred dtype, in this case object.

So I am not sure this is incorrect. This is an explicit user action, we treat None and missing as slightly different.

Note that the other issue is #14381 actually is different as its an entire missing value (e.g. a scalar or an array-like), so the dtype is set to the default (meaning float).

I suppose one could argue that these are the same.

@shoyer @jorisvandenbossche @wesm

Comment From: brandonmburroughs

Okay, I see what you're saying. Your explanation makes sense and I could see it going either way. Thanks for the insight!

Comment From: jorisvandenbossche

I would also keep the current behaviour of keeping object dtype for now.

Comment From: jorisvandenbossche

Note that the other issue is #14381 actually is different as its an entire missing value (e.g. a scalar or an array-like), so the dtype is set to the default (meaning float).

I would not make the distinction in this case. The scalar value just denotes a constant value for the full columns, so None or [None, None, None, ..] should give the same result IMO. That is also what 0.18 does:

In [7]: pd.DataFrame(dict(a=[None]), index= [0])
Out[7]: 
      a
0  None

In [8]: pd.DataFrame(dict(a=None), index= [0])
Out[8]: 
      a
0  None

Comment From: jreback

ok, so closing this as no-change.

@jorisvandenbossche can you put that example on the spawning issue. #14381.

Pandas Dataframe constructor does not coerce data=[None] to np.nan

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`