-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code here
import numpy as np
import pandas as pd
dfx = pd.DataFrame({'a': np.ones(10)})
dfx.iloc[np.array([0]), np.array([0])] = np.array([[2]]) # works
dfx['a_I'] = dfx['a'].astype('Int64')
dfx = dfx[["a_I"]]
dfx.iloc[np.array([0]), np.array([0])] = np.array([[2]]) # doesn't work
Problem description
The second case should set the value to 2.
Expected Output
I have narrowed the problem down to here:
https://github.com/pandas-dev/pandas/blob/dec736f3fcc3337bea0ee180ee28f73e5bf54939/pandas/core/indexing.py#L1768
The dimension of the indexer
is being increased in that call.
Output of pd.show_versions()
Comment From: jorisvandenbossche
@devin-petersohn thanks for the report!
The increase in dimensionality of indexer
might be a problem as well, but the actual error is coming from trying to coerce the LHS values (np.array([[2]])
) to an IntegerArray. The integer array constructor expects a 1D array, not a 2D like that.
So when indexing a dataframe, the 2D array is fine, but when passing this value to Block.setitem
we should reduce the dimension of the value.
Comment From: CloseChoice
Actually this happens to more than just the IntegerArray:
dfx = pd.DataFrame({'a': np.ones(10)})
dfx['Categorical'] = pd.Categorical(dfx['a'])
dfx[['Categorical']].iloc[np.array([0]), np.array([0])] = np.array([[3]])
The BooleanArray shares the same checks in the coerce function. We should check all other array types to be consistent here. Maybe implementing a more general coerce function (coerce2d) is the way to go here. Without using masks (and commenting out the dimensionality checks) the following works:
dfx = pd.DataFrame({'a': np.ones(10)})
dfx['Integer'] = dfx['a'].astype('Int64')
dfx[['Integer']].iloc[np.array([0]), np.array([0])] = np.array([[2]])
and gives the correct result:
a Integer
0 1.0 2
1 1.0 1
2 1.0 1
3 1.0 1
4 1.0 1
5 1.0 1
6 1.0 1
7 1.0 1
8 1.0 1
9 1.0 1
What do you think @jorisvandenbossche @devin-petersohn ?
Comment From: ArnaudChanoine
take