Code Sample
import pandas as pd
df = pd.DataFrame({
'x': range(4),
'c': list('aabb')
})
# These operations should have the same consequences.
# 1. No warning
gdfs = [gdf for _, gdf in df.groupby('c')]
for gdf in gdfs:
gdf['x'] = 1
# 2. SettingWithCopyWarning
gdfs = []
for _, gdf in df.groupby('c'):
gdf['x'] = 1
gdfs.append(gdf)
Problem description
Modifying a group dataframe while iterating through the groups triggers a SettingWithCopyWarning
.
The dataframe referred to is a result of an internal slicing operation and should not lead to a warning in userspace.
Output of pd.show_versions()
Comment From: jreback
slightly related to https://github.com/pandas-dev/pandas/issues/5758
why do you think these should be able to mutate this object? this is a slice, sure its an implementation detail, but in general you would need to copy this. Mutation like this is plain non-idiomatic as well.
Comment From: has2k1
why do you think these should be able to mutate this object?
Mutation like this is plain non-idiomatic as well.
Okay, here is why.
I have custom split-apply-combine code, some of which nests multiple times. groupby().apply()
is/was not an option (functions may have consequences or
to just prevent an explosion of speculative function calls on the first group due to nesting). Where possible I use generators and may reuse the group dataframes. For such a case it is more performant to avoid copying the returned group dataframes.
So other than the oddity about implementation details leaking out, it is nicer to do groupbys if the data is handed back with no strings attached.
Comment From: jreback
@has2k1 you are using pandas non idiomatically and on your own