Code Sample, a copy-pastable example if possible
d = pd.DataFrame()
try:
d.set_value(0, "c", [1,2,4])
except Exception:
print "Caught"
print d # c is now in d
Problem description
The above call set_value (correctly) fails and throw an exception. However, the dataframe has been modified. It has a new 'c' column of type float. Because the operation failed, it probably should have remained unchanged.
Expected Output
print d should print an empty dataframe.
Output of pd.show_versions()
Comment From: jreback
The well-defined and document indexers have these types of guarantees (e.g. .loc
), but these functions are being deprecated: https://github.com/pandas-dev/pandas/issues/15269
so unless someone puts forth a fix, this is won't fix.
Comment From: jreback
actually I spoke too soon about other operators. It depends if there are multiple operations or not. Though honestly you are pushing the boundaries here; these are not atomic operations. That said, if you'd like to push a fix would take it.
In [19]: d = pd.DataFrame()
In [20]: d.at[4,1] = [1,2,3]
ValueError: setting an array element with a sequence.
In [21]: d
Out[21]:
1
4 NaN
this is fine.
In [22]: d = pd.DataFrame()
...:
In [23]: d.loc[4] = [1,2,3]
ValueError: cannot set a frame with no defined columns
In [24]: d
Out[24]:
Empty DataFrame
Columns: []
Index: []
Comment From: jorisvandenbossche
@jreback the same happens with loc if you specify both dims:
In [29]: d = pd.DataFrame()
In [30]: d.loc[4, 3] = [1,2,3]
...
ValueError: setting an array element with a sequence.
In [31]: d
Out[31]:
3
4 NaN
Comment From: jreback
of course this is a 2 stage operation in implementation
Comment From: allComputableThings
... hmmm, so the caller should guess at the implementation to infer the behavior? Generally, any code that throws an exception if better off its it idempotent. Could we leave this as an open bug (clearly it shouldn't modify the dataframe, and is therefore a bug, even if the fix is not obvious).
Comment From: jorisvandenbossche
Given that the behaviour is also present in non-deprecated functionality (.at
, .loc
), I agree we can reopen this.
It will still need someone diving into it to fix it.
Comment From: bt-
Is the result from this example related? I was surprised that this did not return an error when the left side of the last assignment does.
d = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Gives KeyError: "['C', 'D'] not in index"
# d.loc[:, ['A', 'B', 'C', 'D']]
# Gives no error and adds columns 'C' and 'D' with NaNs
d.loc[:, ['A', 'B', 'C', 'D']] = pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]})
d