Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
# Create a simple nested DataFrame
d = {"id": pd.Series([0,1,2]),
"inner_df": pd.Series([pd.DataFrame(np.random.rand(4,4)) for _ in range(3)])}
outer_df = pd.DataFrame(d)
print("Old inner df in first row:")
print(outer_df.loc[0, "inner_df"])
# Now update the inner_df in the first row
new_inner_df = pd.DataFrame(np.random.rand(4, 4))
print("\n*** Results using loc ***")
outer_df.loc[0, "inner_df"] = new_inner_df
print("Expected new inner df in first row:")
print(new_inner_df)
print("Real new inner df in first row:")
print(outer_df.loc[0, "inner_df"])
print("\n*** Results using set_value ***")
outer_df.set_value(0, 'inner_df', new_inner_df)
print("Expected new inner df in first row:")
print(new_inner_df)
print("Real new inner df in first row:")
print(outer_df.loc[0, "inner_df"])
print("\n*** Results using concatenated indexing***")
outer_df["inner_df"][0] = new_inner_df
print("Expected new inner df in first row:")
print(new_inner_df)
print("Real new inner df in first row:")
print(outer_df.loc[0, "inner_df"])
Problem description
The most intuitive method using loc doesn't work if the new value is a dataframe. This leads users to use the third method using concatenated indexing, which should be avoided. The second method using set_value is also not mentioned in the tutorials/docs on indexing. Only by luck, I found a (semi-)related stack overflow question, which hinted to try set_value.
Thus I recommend: 1. Allow assigning a DataFrame to a single cell in a DataFrame or show an error hinting on set_value 2. Mention set_value here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Expected Output
(included in script)
Output of pd.show_versions()
Comment From: jreback
Putting complex objects inside cells of a DataFrame is so non-idiomatic, non-performant and not supported, so closing as 'wont't fix'.