I've boiled my problem down to this example. I am running Python 2.7 on Windows 7 and using pandas 0.17.1.
>>> import pandas as pd
>>> print pd.__version__
0.17.1
>>> x = pd.DataFrame({"a":["james"], "b":[True], "c":["john"]})
>>> x
a b c
0 james True john
>>> x.loc[1] = None
>>> x
a b c
0 james 1 john ##<< THE TYPE HAS CHANGED
1 NaN NaN NaN
>>> x.loc[1,"a"] = "james1"
>>> x.loc[1,"c"] = "john1"
>>> x.loc[1,"b"] = True
>>> x
a b c
0 james 1 john
1 james1 True john1 ## << BUT THE NEXT INSERT WAS OK
>>> x.loc[2] = None ## << AND THIS DOESN'T REPLICATE THE ISSUE
>>> x
a b c
0 james 1 john
1 james1 True john1
2 NaN NaN NaN
In the above I try to add an extra row to the DF. I fill the row "by hand", so to speak, because in my actual situation I don't know all the column values yet and I might need to add an extra column so I fill in the known data first.
The first enlargement however changes the True
to a 1
. Subsequent enlargements don't have this issue. But this does mean eventually I have a DF with a column of 0's, 1's and bools, which is annoying.
I've posted this on SO too: http://stackoverflow.com/questions/37431232/python-pandas-changes-column-value-when-setting-with-enlargement?noredirect=1#comment62368221_37431232
Some suggestions indicate this may well be a bug, so I'm posting it as an issue
Interestingly, if I add the complete row at a time in the above example, the problem doesn't occur...
>>> x = pd.DataFrame({"a":["james"], "b":[True], "c":["john"]})
>>> x
a b c
0 james True john
>>> x.loc[1] = {"a": "james", "b" : False, "c" : "henry"}
>>> x
a b c
0 james True john
1 james False henry
Comment From: jreback
This is as expected. np.nan
is the missing value indicator, not None
. We very rarely allow native None
; non-performant as its a python object.
docs are here
In [1]: x = pd.DataFrame({"a":["james"], "b":[True], "c":["john"]})
In [2]: x
Out[2]:
a b c
0 james True john
In [3]: x.dtypes
Out[3]:
a object
b bool
c object
dtype: object
In [4]: x.loc[1] = None
In [5]: x
Out[5]:
a b c
0 james 1.0 john
1 NaN NaN NaN
In [6]: x.dtypes
Out[6]:
a object
b float64
c object
dtype: object