Pandas REGR: assigning scalar with a length no longer works

Assigning a value to a single location in a DataFrame (using .loc with scalar indexers) started to fail with "values with a length".

Consider the following example:

In [1]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [(1, 2), (1, 2, 3), (3, 4)]})

In [2]: df
Out[2]: 
   a          b
0  1     (1, 2)
1  2  (1, 2, 3)
2  3     (3, 4)

In [3]: df.loc[0, 'b'] = (7, 8, 9)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-a2d59e11519a> in <module>
----> 1 df.loc[0, 'b'] = (7, 8, 9)

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    187             key = com._apply_if_callable(key, self.obj)
    188         indexer = self._get_setitem_indexer(key)
--> 189         self._setitem_with_indexer(indexer, value)
    190 
    191     def _validate_key(self, key, axis):

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    604 
    605                     if len(labels) != len(value):
--> 606                         raise ValueError('Must have equal len keys and value '
    607                                          'when setting with an iterable')
    608 

ValueError: Must have equal len keys and value when setting with an iterable

This raises on 0.23.4 - master, but worked in 0.20.3 - 0.22.0 (the ones I tested):

In [1]: pd.__version__
Out[1]: '0.22.0'

In [2]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [(1, 2), (1, 2, 3), (3, 4)]})

In [3]: df
Out[3]: 
   a          b
0  1     (1, 2)
1  2  (1, 2, 3)
2  3     (3, 4)

In [4]: df.loc[0, 'b'] = (7, 8, 9)

In [5]: df
Out[5]: 
   a          b
0  1  (7, 8, 9)
1  2  (1, 2, 3)
2  3     (3, 4)

Related to https://github.com/pandas-dev/pandas/issues/25806

We don't have very robust support in general for list-like values, but, for the specific case above of updating a single value, I don't think there is anything ambiguous about it? You are updating a single value, so the passed value should simply be put in that place?

Note, the above is with tuples. But, there are also custom objects like MultiPolygons that represent single objects, but do define a __len__ ..

Comment From: jorisvandenbossche

There are several potential issues with assigning list-like values, see eg the issues being linked to here: https://github.com/pandas-dev/pandas/issues/19590#issuecomment-364079529 However, most of those cases occur when assigning to multiple elements at once (unpack the list?), while here the arguments to loc are both scalars.

Probably caused by https://github.com/pandas-dev/pandas/pull/20732

Comment From: elfmanryan

I am getting the same issue as the multipolygon seen as equivalent to a list of polygons. my work around (to assign a MP to a single row) is to wrap the MP in a list first. I am running a try except to catch the error, something like this:

from shapely.geometry import Multipolygon
try:
    geopandas_dataframe_one_row['geometry'] = Multipolygon
except ValueError:
    geopandas_dataframe_one_row['geometry'] = [Multipolygon]

Hope that's helpful to someone.

Comment From: jklatt

Hi! Any solution in sight? Also encountering the issue when wanting to assign MultiPolygons and the list work-around as well as ".values" work-around do not seem to help... Thankful for any hint!

Comment From: koshy1123

@jklatt I was able to resolve by roundtripping a shapely geo through geopandas:

gdf.loc[scalar_index_loc, 'geometry'] = geopandas.GeoDataFrame(geometry=[shapely_geo]).geometry.values

This seems to avoid the ValueError

Comment From: jklatt

Works like a charm! Thank you Thomas :)

Comment From: elizabethswkim

Thanks, @koshy1123 ! Your roundtripping shapely geo via geopandas worked for me!

Comment From: bramson

I've tried various version of the workaround for this problem, but I still can't get anything to work.

What should work:

someData.at[index,'geometry'] = thisGeom

Some workarounds attempted:

someData.loc[index, 'geometry'] = geopandas.GeoDataFrame(geometry=[thisGeom]).geometry.values someData.loc[index, 'geometry'] = geopandas.GeoDataFrame(geometry=[thisGeom]).geometry.values[0] someData.loc[index, 'geometry'] = [geopandas.GeoDataFrame(geometry=[thisGeom]).geometry.values[0]]

someData.loc[[index],'geometry'] = geopandas.GeoSeries([thisGeom]).values

All give

ValueError: Must have equal len keys and value when setting with an iterable

ValueError: Must have equal len keys and value when setting with an ndarray

Is there any updated solution or working workaround?

Comment From: bennlich

@bramson I think the one that worked for me is not in your list:

someData.loc[[index], 'geometry'] = geopandas.GeoDataFrame(geometry=[thisGeom]).geometry.values

:disappointed:

Comment From: hadim

I am having the same issue as well. So far the only solution for me was to build a complex .apply(lambda x: xxx) based hack as a workaround.

Comment From: oboklob

I found that if the geopandas DataFrame contains only a geometry column the problem goes away. As soon as an additional column has values the above issue arises.

So my solution was to split the data into two Dataframes whilst populating geometry, one with only geometry and a second with additional data (using the same indexes). Then combine the two dataframes afterwards. Not ideal, but a work around if you are struggling.