Setting a new column with the sum of two existing columns does not work correctly for large dataframes: X['C']=X['A']+X['B'] may or may not work correctly for some runs and frame of size > 10000 X['A'] and X['B'] may contain any float number. This issue is absent in pandas 0.16.2. I am running 64 bit python on Windows 10. See attached notebook for details:

ReadDebug1.zip

To reproduce:

+++++++++++++++++++++++++++++++++++++

import pandas as pd DataFrameSize=10001 ## will work with 10000 or less XR = pd.DataFrame({'A' : pd.Series(1,index=list(range(DataFrameSize)),dtype='float32'), 'B' : pd.Series(2,index=list(range(DataFrameSize)),dtype='float32')})

def CleanDebug(X):
X.loc[:,'C']=X.loc[:,'A']+X.loc[:,'B'] #X['C']=X['A']+X['B']
return X

for i in xrange(1000): print 'iteration ',i ry = CleanDebug(XR)
assert abs(ry.C.sum()-30003)<1

Comment From: jorisvandenbossche

Can you show the output of pd.show_versions()? Maybe also put the notebook in a gist on github, then it is easier to see the content

Comment From: jreback

this is almost certainly the same as https://github.com/pydata/pandas/issues/12023

you prob have and older numexpr, upgrade to 2.4.6 (latest) and reconfirm.

Comment From: kirickt

yes, it was older num_expr 2.4.4. After upgrade to 2.4.6 bug went away.Thanks a lot!

Comment From: kirickt

Should I delete the post?

Comment From: jreback

nope it's good