Code Sample, a copy-pastable example if possible
df = pd.DataFrame({'x':[1,2,3,4,5,6], 'y':[6,5,7,4,3,5], 'z':list('qqqrrs')})
def f(df):
return df.sort_values(by='x')
def g(df):
return df
print(df.groupby('z', group_keys=True).apply(f)) # Group keys are added
print(df.groupby('z', group_keys=True).apply(g)) # Group keys are not added
Problem description
Because df is already sorted for all columns, the groupby result should be same. However they are different: only for the case of df.sort_result() correctly adds group keys to the groupby result.
PS: I have one observation. if g(df) returned df[::] instead of df, then groupkeys were added.
Expected Output
Since group_keys=True, the output should look like the following:
x y z
z
q 0 1 6 q
1 2 5 q
2 3 7 q
r 3 4 4 r
4 5 3 r
s 5 6 5 s
Output of pd.show_versions()
[paste the output of ``pd.show_versions()`` here below this line]
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
pandas_gbq: None
pandas_datareader: 0.7.0
Comment From: WillAyd
Most likely related if not duplicative of #22546
Comment From: smithto1
take
Comment From: smithto1
I think this is another issue that falls under the #34998 fix.
Comment From: rhshadrach
Agreed @smithto1 - I get the expected output on main now. Closing.