Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

df = pd.DataFrame({1: [1, 2, 3], 2: [np.zeros(3), np.ones(3), np.zeros(0)]})

print(df.dtypes)

df.loc[1, 2] = np.zeros(3)

Issue Description

The loc assign throws a "ValueError: Must have equal len keys and value when setting with an iterable" despite dtypes being object for column 2. This started to happen when I updated Pandas from 1.4.4 -> 2.2.1. In 1.4.4 this syntax worked. This issue also happens when assigning multiple columns at the same time as soon as one of the new values is an iterable.

Expected Behavior

The assign should set the second row to np.array of zeros.

Installed Versions

INSTALLED VERSIONS ------------------ commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.9.19.final.0 python-bits : 64 OS : Darwin OS-release : 23.4.0 Version : Darwin Kernel Version 23.4.0: Wed Feb 21 21:44:06 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T8103 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : en_GB.UTF-8 pandas : 2.2.1 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0 setuptools : 69.2.0 pip : 24.0 Cython : 3.0.9 pytest : 8.1.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 5.1.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.18.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.2.0 gcsfs : None matplotlib : 3.8.1 numba : 0.59.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.3 sqlalchemy : 2.0.28 tables : None tabulate : 0.9.0 xarray : None xlrd : 2.0.1 zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None None

Comment From: dimitsev

@isabelladegen The exact same problem appears on the main branch of pandas 2.2.2 if you do df.loc[1, 2] = list(np.zeros(3)). Please update your title that the problem happens with any iterable, not just a numpy array. Please also tick the checkbox that "this bug exists on the main branch of pandas".

Comment From: onejgordon

Just ran into this today with 2.2.2. Is anyone aware of a workaround?

Comment From: isabelladegen

@onejgordon This is how I worked around it. Warning it is ugly. For the above toy example, instead of indexing the row and column(s) to update the cell like this: df.loc[1, 2] = np.zeros(3)

I update the whole row like this: df.loc[1] = {1:2, 2:np.zeros(3)}

Comment From: onejgordon

Thanks for your response @isabelladegen.

It appears one of the factors influencing my case was a non-unique index. Dropping duplicates from the index column resolved the error and I was able to set both arrays and individual cells with:

df.loc[3, 'arr'] = np.array([1, 2])

and

df.loc[[2, 4, 6], 'arr'] = np.array([1, 2])

Comment From: saeub

Using df.at instead of df.loc seems to do the job if you only want to set a single value:

df.at[1, 2] = np.zeros(3)

Comment From: AlastairKelly

I'm having the same problem with df.at, unfortunately. df.at[row.Index,"pmids_old_before"] = results.get('idlist') is generating the same error as df.loc[row.Index,"pmids_old_before"] = results.get('idlist')

Oddly, this was working fine for me until today, and I haven't changed anything in my coding environment, so I'm very puzzled about the break. Also, it still works on two dataframes before suddenly generating the error on the third that uses this function. I can't figure out any salient differences. The function creates and initializes this column with None values before trying to make this assignment, so it should be identical conditions for each dataframe.

Comment From: saeub

@AlastairKelly does your column pmids_old_before have dtype=object?

Comment From: JakeHightower

Using df.at instead of df.loc seems to do the job if you only want to set a single value:

python df.at[1, 2] = np.zeros(3)

@saeub method worked for me, while having identical issue listed here. Ty.

Comment From: tcourat

Same issue in panda 2.2.2 and nothing works (.at or wrapping with a list)