A small, complete example of the issue
As called out in #14381, if dtype
is not specified, values of None
should be coerced to np.nan
. However, when a list of only None
is passed to data
, the None
remains.
>>> pd.DataFrame([None])
0
0 None
>>> pd.DataFrame([None, None])
0
0 None
1 None
Expected Output
>>> pd.DataFrame([None])
0
0 NaN
>>> pd.DataFrame([None, None])
0
0 NaN
1 NaN
Output of pd.show_versions()
Comment From: jreback
IIRC this was actually special cased to allow this to be object
dtype, with the passed None
' s remaining. (further, Series([None])
is another case).
We don't coerce provided None
's in a constructor for object
dtype, e.g.
In [3]: Series(['a', None])
Out[3]:
0 a
1 None
dtype: object
So I agree that we should be consistent, but its difficult to know when to leave an actual None
.
I last changed this here, though I recall this being the case for quite some time. We like to preserve an inferred dtype, in this case object
.
So I am not sure this is incorrect. This is an explicit user action, we treat None
and missing as slightly different.
Note that the other issue is #14381 actually is different as its an entire missing value (e.g. a scalar or an array-like), so the dtype is set to the default (meaning float).
I suppose one could argue that these are the same.
@shoyer @jorisvandenbossche @wesm
Comment From: brandonmburroughs
Okay, I see what you're saying. Your explanation makes sense and I could see it going either way. Thanks for the insight!
Comment From: jorisvandenbossche
I would also keep the current behaviour of keeping object dtype for now.
Comment From: jorisvandenbossche
Note that the other issue is #14381 actually is different as its an entire missing value (e.g. a scalar or an array-like), so the dtype is set to the default (meaning float).
I would not make the distinction in this case. The scalar value just denotes a constant value for the full columns, so None
or [None, None, None, ..]
should give the same result IMO.
That is also what 0.18 does:
In [7]: pd.DataFrame(dict(a=[None]), index= [0])
Out[7]:
a
0 None
In [8]: pd.DataFrame(dict(a=None), index= [0])
Out[8]:
a
0 None
Comment From: jreback
ok, so closing this as no-change.
@jorisvandenbossche can you put that example on the spawning issue. #14381.