Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
# Create example dataframe
In [2]: df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})
In [3]: df
Out[3]:
a b
0 1 4
1 2 5
2 3 8
# Add 0 to end of series
In [4]: df.a.append(pd.Series([0]), ignore_index=True)
Out[4]:
0 1
1 2
2 3
3 0
dtype: int64
# Add 0 to end of series and put replace in original dataframe
In [5]: df.a = df.a.append(pd.Series([0]), ignore_index=True)
# replacement fails without an error
In [6]: df
Out[6]:
a b
0 1 4
1 2 5
2 3 8
Problem description
The current failure is really annoying (as all silent failures are) because I have no idea A) why it doesn't work and B) how I can fix the issue.
What I expect: An error message saying something regarding why the operation did not succeed.
Output of pd.show_versions()
In [7]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: 3.6.4
pip: 19.0.3
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.13.3
scipy: 1.1.0
pyarrow: None
xarray: 0.11.3
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.3
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
So why are you even trying to do this anyways...?
Basically I want to verify that a part of my code is working right. The easiest way to do that is to simply replace the output data that exists with a simpler form. In this case, I add an extra data point to the independent variable series and then use that as an input on a function:
df.a = # Add a zero point
df.b = df.a*5 #or some other really simple function like that
Comment From: WillAyd
This should raise. Using a list does:
>>> df['a'] = [1, 2, 3, 4]
ValueError: Length of values does not match length of index
But interesting that a Series does not:
>>> df['a'] = pd.Series([1, 2, 3, 4])
Comment From: jreback
using a Series doesn’t raise because the passed series is reindexed to the existing one (which is not updated)
this might be a cache updating problem
Comment From: machenity
I'm looking into this and i have a few question for this. I think that using a Series with different length should raise because of consistency with other types(list or other sequentials).
Then I guess that it can be implemented like: 1. check the length of original Series. 2. reindex() 3. check the length of the result, and compare with 1 4. if differ, raise ValueError
Does this approach seem OK?
Comment From: jreback
you don’t need to do any of that rather you need to look at the _cache
Comment From: machenity
@jreback Addition to upper example:
>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})
>>> df.a = pd.Series([1,1,1,1])
>>> df
a b
0 1 4
1 1 5
2 1 8
>>> df.a = [0,0,0,0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/machen/Projects/pandas/pandas/core/generic.py", line 5086, in __setattr__
self[name] = value
File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3328, in __setitem__
self._set_item(key, value)
File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3403, in _set_item
value = self._sanitize_column(key, value)
File "/Users/machen/Projects/pandas/pandas/core/frame.py", line 3590, in _sanitize_column
value = sanitize_index(value, self.index, copy=False)
File "/Users/machen/Projects/pandas/pandas/core/internals/construction.py", line 537, in sanitize_index
raise ValueError('Length of values does not match length of index')
ValueError: Length of values does not match length of index
Is problem in cache though?
Comment From: machenity
I'm confused with the correct result case. I think it already work correctly, according to reindex example. Then there is no failure from the beginning. If I'm wrong, what is the expected result of this case?
a b
0 1 4
1 2 5
2 3 8
3 0 NaN
- Raise error
- No change, no error
Comment From: phofl
This works as expected.
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,8]})
df["a"] = pd.Series([10, 20, 30, 4], index=[2, 3, 4, 5])
reindexes
a b
0 NaN 4
1 NaN 5
2 10.0 8