Code Sample
a = np.array([1,2,3, np.nan])
b = pd.DataFrame(a)
b.fillna(4, inplace=True)
print b
print a
Output
0
0 1.0
1 2.0
2 3.0
3 4.0
[ 1. 2. 3. 4.]
Problem description
When a dataframe is created from a numpy array the changes to the dataframe are altering the original numpy array. I did not expect this to happen and I'm not sure if this is an expected behaviour or a known issue.
I do know how to work around this, but my question is whether I have to.
Expected Output
0
0 1.0
1 2.0
2 3.0
3 4.0
[ 1. 2. 3. nan]
Output of pd.show_versions()
Comment From: jreback
so this is a 'feature', in that view propogation in numpy is a feature. As a user you have to be congnizant of it, and it can make things quite performant. Pandas does not own a passed in numpy array and thus it IS externally visible.
In general, using inplace=True
ops are not idiomatic to pandas, virtually all operations return new data (which is copied).
Note that view propogation is only true in some cases: single dtyped, no prior modification, no dtype changes on the op, and non-object types.
In [1]: a = np.array([1,2,3, np.nan])
...: b = pd.DataFrame(a)
...: b.fillna(4, inplace=True)
# this is the viewed array
In [2]: b.values.base
Out[2]: array([ 1., 2., 3., 4.])
In [3]: a2 = np.array(['a', 'b', 'c'])
# not true for object dtypes
In [4]: b2 = pd.DataFrame(a2)
In [5]: b2.loc[0, 0] = 'foo'
In [6]: b2
Out[6]:
0
0 foo
1 b
2 c
In [7]: a2
Out[7]:
array(['a', 'b', 'c'],
dtype='<U1')
This happens to be only in-place in pandas itself and not numpy.
In [8]: a = np.array([1,2,3, np.nan]) ...:
In [9]: b = pd.DataFrame(a)
In [10]: b +=1
In [11]: a
Out[11]: array([ 1., 2., 3., nan])
In [13]: b.values.base
Out[13]: array([[ 2., 3., 4., nan]])