Code Sample, a copy-pastable example if possible


In [1]: import pandas as pd

# Create example dataframe
In [2]: df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})

In [3]: df
Out[3]:
   a  b
0  1  4
1  2  5
2  3  8

# Add 0 to end of series 
In [4]: df.a.append(pd.Series([0]), ignore_index=True)
Out[4]:
0    1
1    2
2    3
3    0
dtype: int64

# Add 0 to end of series and put replace in original dataframe
In [5]: df.a = df.a.append(pd.Series([0]), ignore_index=True)

# replacement fails without an error
In [6]: df
Out[6]:
   a  b
0  1  4
1  2  5
2  3  8

Problem description

The current failure is really annoying (as all silent failures are) because I have no idea A) why it doesn't work and B) how I can fix the issue.

What I expect: An error message saying something regarding why the operation did not succeed.

Output of pd.show_versions()

In [7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: 3.6.4
pip: 19.0.3
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.13.3
scipy: 1.1.0
pyarrow: None
xarray: 0.11.3
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.3
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

So why are you even trying to do this anyways...?

Basically I want to verify that a part of my code is working right. The easiest way to do that is to simply replace the output data that exists with a simpler form. In this case, I add an extra data point to the independent variable series and then use that as an input on a function:

df.a = # Add a zero point
df.b = df.a*5 #or some other really simple function like that

Comment From: WillAyd

This should raise. Using a list does:

>>> df['a'] = [1, 2, 3, 4]
ValueError: Length of values does not match length of index

But interesting that a Series does not:

>>> df['a'] = pd.Series([1, 2, 3, 4])

Comment From: jreback

using a Series doesn’t raise because the passed series is reindexed to the existing one (which is not updated)

this might be a cache updating problem

Comment From: machenity

I'm looking into this and i have a few question for this. I think that using a Series with different length should raise because of consistency with other types(list or other sequentials).

Then I guess that it can be implemented like: 1. check the length of original Series. 2. reindex() 3. check the length of the result, and compare with 1 4. if differ, raise ValueError

Does this approach seem OK?

Comment From: jreback

you don’t need to do any of that rather you need to look at the _cache

Comment From: machenity

@jreback Addition to upper example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})
>>> df.a = pd.Series([1,1,1,1])
>>> df
   a  b
0  1  4
1  1  5
2  1  8
>>> df.a = [0,0,0,0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/machen/Projects/pandas/pandas/core/generic.py", line 5086, in __setattr__
    self[name] = value
  File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3328, in __setitem__
    self._set_item(key, value)
  File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3403, in _set_item
    value = self._sanitize_column(key, value)
  File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3590, in _sanitize_column
    value = sanitize_index(value, self.index, copy=False)
  File "/Users/machen/Projects/pandas/pandas/core/internals/construction.py", line 537, in sanitize_index
    raise ValueError('Length of values does not match length of index')
ValueError: Length of values does not match length of index

Is problem in cache though?

Comment From: machenity

I'm confused with the correct result case. I think it already work correctly, according to reindex example. Then there is no failure from the beginning. If I'm wrong, what is the expected result of this case?

a  b
0  1  4
1  2  5
2  3  8
3  0 NaN
  1. Raise error
  2. No change, no error

Comment From: phofl

This works as expected.

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})
df["a"] = pd.Series([10, 20, 30, 4], index=[2, 3, 4, 5])

reindexes

      a  b
0   NaN  4
1   NaN  5
2  10.0  8