Pandas Storing a dict in a DataFrame fails

Code Sample, a copy-pastable example if possible

Both of the examples below fail with the same error

df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])

df.loc[0, 'a'] = dict(x=2)
df.iloc[0, 0] = dict(x=2)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-282-62f3ee5ff885> in <module>()
      1 # file_map.loc[file_no, 'Q_step_length'] = dict(a=1)
      2 df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])
----> 3 df.iloc[0, 0] = dict(x=2)
      4 df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)
      5 df

...\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    177             key = com._apply_if_callable(key, self.obj)
    178         indexer = self._get_setitem_indexer(key)
--> 179         self._setitem_with_indexer(indexer, value)
    180 
    181     def _has_valid_type(self, k, axis):

...\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
    603 
    604             if isinstance(value, (ABCSeries, dict)):
--> 605                 value = self._align_series(indexer, Series(value))
    606 
    607             elif isinstance(value, ABCDataFrame):

...\lib\site-packages\pandas\core\indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
    743             return ser.reindex(ax)._values
    744 
--> 745         raise ValueError('Incompatible indexer with Series')
    746 
    747     def _align_frame(self, indexer, df):

ValueError: Incompatible indexer with Series

This works, but is placing a list into the dataframe

df[0, 'a'] = [dict(x=2)]

It is possible to get the dict directly in the dataframe by using a very inelegant construct like this:

df['a'] = df['a'].apply(lambda x: x[0] if not pd.isnull(x) else x)

Problem description

Since it is possible to store a dict in a dataframe, trying an assignment as above should not fail. I am aware that df.loc[...] = dict(...) will assign values in the dict to the corresponding columns if present (is that documented?) and has its own issues but this behaviour should not apply when accessing a single location of the dataframe

Expected Output

A dataframe with a dict inside the specified location.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 36.5.0 Cython: 0.26 numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

this is pretty non-idiomatic, and you are pretty much on your own here. you could do it by just using a list/tuple around it

In [14]: df.loc[0, 'a'] = [dict(x=2)]

In [15]: df
Out[15]: 
            a    b
0  [{'x': 2}]  NaN
1         NaN  NaN
2         NaN  NaN

Comment From: aaclayton

Encountered the same issue, had two thoughts:

Storing a dict within a DataFrame is unusual, but there are valid cases where software may be using Pandas as a way to represent and manipulate arbitrary key/value style data where the data is indexed in a way that makes sense for panel representation.

The behavior that location based indexing will update columns based on the keys/values of a provided dictionary was a surprise to me. This is a cool convenience feature that makes sense when an explicit column is not referenced. For example, when providing:

df.loc[row, :] = dict(key1=value1, key2=value2)

It makes sense that the keys of the dictionary might be written as columns and that df.loc[row, key1] == value1. However, when providing an explicit column index, inferring the target columns from a provided dictionary is (to me) counter-intuitive. If I instead supply:

df.loc[row, col] = dict(key=value)

I am explicitly denoting that I want to store the entire value in the col column, and I would expect the dictionary to be inserted as-is.

Anyways, I agree with @jreback that this is somewhat non-idiomatic BUT I am sympathetic to the original issue raised by @andreas-thomik. I encountered a problem where trying to store a dict to an element of a dataframe using this syntax made sense for the particular problem I was facing, so he isn't entirely on his own with this request.

Comment From: jreback

@aaclayton this is related to #18955 . We could/should prob supporting setting scalars of dicts better (and other iterables). Its a bit tricky though.

Comment From: varadpatil

@jreback, The behaviour is not uniform here, as assigning more than one value with dictionaries works, while assigning a single value doesn't.

This works:

In [18]: df = pd.DataFrame(index=[0, 1, 2], columns=['a', 'b'])

In [19]: df
Out[19]: 
     a    b
0  NaN  NaN
1  NaN  NaN
2  NaN  NaN

In [20]: df.loc[slice(None),'a']=[{'x':2}]*3

In [21]: df
Out[21]: 
          a    b
0  {'x': 2}  NaN
1  {'x': 2}  NaN
2  {'x': 2}  NaN

In [22]: df.loc[0]=[{'y':3}]*2

In [23]: df
Out[23]: 
          a         b
0  {'y': 3}  {'y': 3}
1  {'x': 2}       NaN
2  {'x': 2}       NaN

while trying to assign to a aingle value as dic doesn't work.

In [25]: df.loc[0,'a']={'z':4}
ValueError: Incompatible indexer with Series

This creates confusion, @jreback, can you please consider it to be fixed in upcoming versions?

Pandas Storing a dict in a DataFrame fails

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`