Example 1 (.values
returning a view)
df = pd.DataFrame([["A", "B"],["C", "D"]])
df.values[0][0] = 0
df.values[0][1] = 5
# df
# 0 1
# 0 0 5
# 1 C D
Example 2 (.values
returning a copy) (Taken from COLDSPEED's answer in this question)
In [755]: df = pd.DataFrame({'A' :[1, 2], 'B' : ['ccc', 'ddd']})
In [756]: df
Out[756]:
A B
0 1 ccc
1 2 ddd
In [757]: v = df.values
In [758]: v[0] = 123
In [759]: v[0, 1] = 'zzxxx'
In [760]: df
Out[760]:
A B
0 1 ccc
1 2 ddd
Problem description
This is based on a question I asked on SO, Will changes in DataFrame.values always modify the values in the data frame?. You can see from the two examples above, sometimes, .values
function returns a view while sometimes returns a copy.
I wonder whether this is safe.
I would advocate adding an option to force returning a view or copy. This is because someone can do some process on df.values
and change df
by accident. Or when people expect to change the values on a df but it does not happen because a copy is returned. See this question for example. People are advocating changing df.values
directly like np.fill_diagonal(df.values, 0)
to change the diagonal part of a data frame.
I think users at least need to be warned. Now on the documentation, it only contains a line "Numpy representation of NDFrame"
Expected Output
Output of pd.show_versions()
Comment From: jreback
this is an implementation detail. since you are getting a single dtyped numpy array, it is upcast to a compatible dtype. if you have mixed dtypes, then you almost always will have a copy (the exception is mixed float dtypes will not copy I think), but this is a numpy detail.
I agree this is not great, but it has been there from the beginning and will not change in current pandas. If exporting to numpy you need to take care.
Comment From: jreback
all that said, if you want to update the docs to make this more clear that would be fine.
Comment From: tyrinwu
Thanks for the information. I appreciate the explanation.
Maybe add a line to the documentation that "It could return either a view or a copy. Changes to DataFrame.values
may or may not change the DataFrame itself" ?