• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import numpy as np
import pandas as pd

dfx = pd.DataFrame({'a': np.ones(10)}) 
dfx.iloc[np.array([0]), np.array([0])] = np.array([[2]]) # works
dfx['a_I'] = dfx['a'].astype('Int64') 
dfx = dfx[["a_I"]] 
dfx.iloc[np.array([0]), np.array([0])] = np.array([[2]]) # doesn't work

Problem description

The second case should set the value to 2.

Expected Output

I have narrowed the problem down to here:

https://github.com/pandas-dev/pandas/blob/dec736f3fcc3337bea0ee180ee28f73e5bf54939/pandas/core/indexing.py#L1768

The dimension of the indexer is being increased in that call.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.6.8.final.0 python-bits : 64 OS : Darwin OS-release : 19.3.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.0.3 numpy : 1.18.1 pytz : 2018.5 dateutil : 2.7.3 pip : 19.2.3 setuptools : 42.0.2 Cython : 0.29.10 pytest : 5.4.1 hypothesis : None sphinx : 2.4.4 blosc : None feather : 0.4.0 xlsxwriter : None lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : 2.8.3 (dt dec pq3 ext lo64) jinja2 : 2.11.1 IPython : 7.0.1 pandas_datareader: None bs4 : 4.8.2 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.2.0 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.3 pandas_gbq : 0.13.1 pyarrow : 0.16.0 pytables : None pytest : 5.4.1 pyxlsb : None s3fs : 0.4.0 scipy : 1.4.1 sqlalchemy : 1.3.15 tables : 3.6.1 tabulate : None xarray : 0.15.0 xlrd : 1.2.0 xlwt : None xlsxwriter : None numba : 0.48.0

Comment From: jorisvandenbossche

@devin-petersohn thanks for the report!

The increase in dimensionality of indexer might be a problem as well, but the actual error is coming from trying to coerce the LHS values (np.array([[2]])) to an IntegerArray. The integer array constructor expects a 1D array, not a 2D like that. So when indexing a dataframe, the 2D array is fine, but when passing this value to Block.setitem we should reduce the dimension of the value.

Comment From: CloseChoice

Actually this happens to more than just the IntegerArray:

dfx = pd.DataFrame({'a': np.ones(10)})
dfx['Categorical'] = pd.Categorical(dfx['a'])
dfx[['Categorical']].iloc[np.array([0]), np.array([0])] = np.array([[3]])

The BooleanArray shares the same checks in the coerce function. We should check all other array types to be consistent here. Maybe implementing a more general coerce function (coerce2d) is the way to go here. Without using masks (and commenting out the dimensionality checks) the following works:

dfx = pd.DataFrame({'a': np.ones(10)})
dfx['Integer'] = dfx['a'].astype('Int64')
dfx[['Integer']].iloc[np.array([0]), np.array([0])] = np.array([[2]])

and gives the correct result:

     a  Integer 
0  1.0        2 
1  1.0        1 
2  1.0        1 
3  1.0        1 
4  1.0        1 
5  1.0        1 
6  1.0        1 
7  1.0        1 
8  1.0        1 
9  1.0        1 

What do you think @jorisvandenbossche @devin-petersohn ?

Comment From: ArnaudChanoine

take