Code Sample

import pandas as pd

df = pd.DataFrame({
    'x': range(4),
    'c': list('aabb')
})

# These operations should have the same consequences.

# 1. No warning
gdfs = [gdf for _, gdf in df.groupby('c')]
for gdf in gdfs:
    gdf['x'] = 1

# 2. SettingWithCopyWarning
gdfs = []
for _, gdf in df.groupby('c'):
    gdf['x'] = 1
    gdfs.append(gdf)

Problem description

Modifying a group dataframe while iterating through the groups triggers a SettingWithCopyWarning. The dataframe referred to is a result of an internal slicing operation and should not lead to a warning in userspace.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.11.1-gentoo machine: x86_64 processor: Intel byteorder: little LC_ALL: en_US.utf8 LANG: en_US.utf8 LOCALE: en_US.UTF-8 pandas: 0.23.0.dev0+81.g78147e9c8.dirty pytest: 3.3.1 pip: 9.0.1 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.14.0 scipy: None pyarrow: None xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: 2.6.4 feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0.1 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

slightly related to https://github.com/pandas-dev/pandas/issues/5758

why do you think these should be able to mutate this object? this is a slice, sure its an implementation detail, but in general you would need to copy this. Mutation like this is plain non-idiomatic as well.

Comment From: has2k1

why do you think these should be able to mutate this object?

Mutation like this is plain non-idiomatic as well.

Okay, here is why.

I have custom split-apply-combine code, some of which nests multiple times. groupby().apply() is/was not an option (functions may have consequences or to just prevent an explosion of speculative function calls on the first group due to nesting). Where possible I use generators and may reuse the group dataframes. For such a case it is more performant to avoid copying the returned group dataframes.

So other than the oddity about implementation details leaking out, it is nicer to do groupbys if the data is handed back with no strings attached.

Comment From: jreback

@has2k1 you are using pandas non idiomatically and on your own