Pandas Silent failure on column reassignment

Code Sample, a copy-pastable example if possible


In [1]: import pandas as pd

# Create example dataframe
In [2]: df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})

In [3]: df
Out[3]:
   a  b
0  1  4
1  2  5
2  3  8

# Add 0 to end of series 
In [4]: df.a.append(pd.Series([0]), ignore_index=True)
Out[4]:
0    1
1    2
2    3
3    0
dtype: int64

# Add 0 to end of series and put replace in original dataframe
In [5]: df.a = df.a.append(pd.Series([0]), ignore_index=True)

# replacement fails without an error
In [6]: df
Out[6]:
   a  b
0  1  4
1  2  5
2  3  8

Problem description

The current failure is really annoying (as all silent failures are) because I have no idea A) why it doesn't work and B) how I can fix the issue.

What I expect: An error message saying something regarding why the operation did not succeed.

Output of `pd.show_versions()`

In [7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: 3.6.4
pip: 19.0.3
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.13.3
scipy: 1.1.0
pyarrow: None
xarray: 0.11.3
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.3
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

So why are you even trying to do this anyways...?

Basically I want to verify that a part of my code is working right. The easiest way to do that is to simply replace the output data that exists with a simpler form. In this case, I add an extra data point to the independent variable series and then use that as an input on a function:

df.a = # Add a zero point
df.b = df.a*5 #or some other really simple function like that

Comment From: WillAyd

This should raise. Using a list does:

>>> df['a'] = [1, 2, 3, 4]
ValueError: Length of values does not match length of index

But interesting that a Series does not:

>>> df['a'] = pd.Series([1, 2, 3, 4])

Comment From: jreback

using a Series doesn’t raise because the passed series is reindexed to the existing one (which is not updated)

this might be a cache updating problem

Comment From: machenity

I'm looking into this and i have a few question for this. I think that using a Series with different length should raise because of consistency with other types(list or other sequentials).

Then I guess that it can be implemented like: 1. check the length of original Series. 2. reindex() 3. check the length of the result, and compare with 1 4. if differ, raise ValueError

Does this approach seem OK?

Comment From: jreback

you don’t need to do any of that rather you need to look at the _cache

Comment From: machenity

@jreback Addition to upper example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})
>>> df.a = pd.Series([1,1,1,1])
>>> df
   a  b
0  1  4
1  1  5
2  1  8
>>> df.a = [0,0,0,0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/machen/Projects/pandas/pandas/core/generic.py", line 5086, in __setattr__
    self[name] = value
  File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3328, in __setitem__
    self._set_item(key, value)
  File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3403, in _set_item
    value = self._sanitize_column(key, value)
  File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3590, in _sanitize_column
    value = sanitize_index(value, self.index, copy=False)
  File "/Users/machen/Projects/pandas/pandas/core/internals/construction.py", line 537, in sanitize_index
    raise ValueError('Length of values does not match length of index')
ValueError: Length of values does not match length of index

Is problem in cache though?

Comment From: machenity

I'm confused with the correct result case. I think it already work correctly, according to reindex example. Then there is no failure from the beginning. If I'm wrong, what is the expected result of this case?

Raise error
No change, no error

Comment From: phofl

This works as expected.

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})
df["a"] = pd.Series([10, 20, 30, 4], index=[2, 3, 4, 5])

reindexes

      a  b
0   NaN  4
1   NaN  5
2  10.0  8

Pandas Silent failure on column reassignment

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

So why are you even trying to do this anyways...?

Output of `pd.show_versions()`