Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
print "pd.__version__: %s" % pd.__version__
print "\n"
df = pd.DataFrame(np.arange(0, 3, 0.5), columns=["a"])
# copy the values of a columns
vals_copy = df.a.values
print "vals_copy max value before '-=' operation: %f" % vals_copy.max()
print "DataFrame Column max value before '-=' operation: %f " % df.a.max()
print "\n"
# '-=' opperation on the values copy
vals_copy -= vals_copy.max()
print "vals_copy max value after '-=' operation: %f" % vals_copy.max()
print "DataFrame Column max value after '-=' operation: %f" % df.a.max()
print "\n"
if vals_copy.max() == df.a.max():
print "This is a bug! The DataFrame column should not be effected."
"""
Output:
pd.__version__: 0.20.3
vals_copy max value before '-=' operation: 2.500000
DataFrame Column max value before '-=' operation: 2.500000
vals_copy max value after '-=' operation: 0.000000
DataFrame Column max value after '-=' operation: 0.000000
This is a bug! The DataFrame column should not be effected.
"""
Problem description
Changing the 'values' of a DataFrame columns should not effect the original DataFrame.
Expected Output
Output of pd.show_versions()
[paste the output of ``pd.show_versions()`` here below this line]
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-23-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: 2.8.1
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.23.4
numpy: 1.12.1
scipy: 0.19.1
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None
Comment From: jreback
copy the values of a columns vals_copy = df.a.values
this is not a copy at all, but just a reference to the original.
Comment From: jreback
pls read the doc-string of .values
, which just returns a numpy array of the Series, this is by-definition a view on the original data. For a DataFrame this may be a view, or a copy, depending on the dtypes. Relying on .values
to be a view/copy is not guaranteed.