Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame(dict(a=[0,0,0,1,1,1,2,2,2], b=[0,2,1,3,5,4,7,6,8]))
df
def f(x):
x['c'] = 1
return x.sort_values('b')
df.groupby('a', as_index=False).apply(f).reset_index(drop=True)
Expected Output
pd.DataFrame(dict(a=[0, 0, 0, 1, 1, 1, 2, 2, 2],
b=[0, 1, 2, 3, 4, 5, 6, 7, 8],
c=[1, 1, 1, 1, 1, 1, 1, 1, 1]))
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.1.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-87-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 20.7.0 Cython: None numpy: 1.11.0 scipy: 0.16.0 statsmodels: None xarray: None IPython: 4.2.0 sphinx: None patsy: None dateutil: 2.4.2 pytz: 2015.4 blosc: None bottleneck: None tables: None numexpr: 2.4.3 matplotlib: None openpyxl: 2.2.2 xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: 2.6 (dt dec pq3 ext lo64) jinja2: 2.8 boto: None pandas_datareader: None
Comment From: jreback
dupe of this: https://github.com/pydata/pandas/issues/12652
In [7]: def f(x):
...: x = x.copy()
...: x['c'] = 1
...: return x.sort_values('b')
...:
In [8]: df.groupby('a', as_index=False).apply(f).reset_index(drop=True)
Out[8]:
a b c
0 0 0 1
1 0 1 1
2 0 2 1
3 1 3 1
4 1 4 1
5 1 5 1
6 2 6 1
7 2 7 1
8 2 8 1
you know this is completely inefficient though, right? doing lots of things in groupby is a no-no
In [11]: df.assign(c=1).sort_values('b').reset_index(drop=True)
Out[11]:
a b c
0 0 0 1
1 0 1 1
2 0 2 1
3 1 3 1
4 1 4 1
5 1 5 1
6 2 6 1
7 2 7 1
8 2 8 1