Pandas Using agg with groupy, as_index=False still returning group variable as index

Code Sample, a copy-pastable example if possible

Code sample:

# Import packages
import pandas as pd
import numpy as np
# Set up test DataFrame
test_array = np.arange(50) + 100
test_matrix = test_array.reshape((10,5))
test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
test_df.loc[0:5,'shouldnt be index'] = 3
test_df.loc[5:8,'shouldnt be index'] = 4
# groupby and agg
end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
print(end_result)

execution:

>>> # Import packages
... import pandas as pd
>>> import numpy as np
>>> # Set up test DataFrame
... test_array = np.arange(50) + 100
>>> test_matrix = test_array.reshape((10,5))
>>> test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
>>> # Make groupby data more grouped
... test_df.loc[0:5,'shouldnt be index'] = 3
>>> test_df.loc[5:8,'shouldnt be index'] = 4
>>> # groupby and agg
... end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
>>> print(end_result)
                     1               2               3               4
                   min  max count  min  max count  min  max count  min  max count
shouldnt be index
3                  101  121     5  102  122     5  103  123     5  104  124     5
4                  126  141     4  127  142     4  128  143     4  129  144     4
145                146  146     1  147  147     1  148  148     1  149  149     1

Problem description

I'm trying to use groupby with as_index=False and then do an aggregate statement. I included an example where the groupby variable ends up as an index rather than staying as a column. My understanding is that this should result in the groupby variable being a column and not an index (as below), but perhaps I am mistaken.

This is my first time creating an issue, so my apologies if this is operator error or I didn't include important information. Please let me know if this is the case.

Maybe this is related to #22546?

Expected Output

You can see what the result should be when using "reset_index"

>>> end_result.reset_index()
  shouldnt be index    1               2               3               4
                     min  max count  min  max count  min  max count  min  max count
0                 3  101  121     5  102  122     5  103  123     5  104  124     5
1                 4  126  141     4  127  142     4  128  143     4  129  144     4
2               145  146  146     1  147  147     1  148  148     1  149  149     1

Output of `pd.show_versions()`

>>> pd.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.24.0 pytest: 3.8.0 pip: 19.0.1 setuptools: 40.2.0 Cython: 0.28.5 numpy: 1.15.1 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.7.9 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.8 feather: None matplotlib: 2.2.3 openpyxl: 2.5.6 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.1.0 lxml.etree: 4.2.5 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.11 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Comment From: WillAyd

Thanks for the report. I think the problem here is a conflict between the as_index keyword and how we are piecing together the result of multiple agg function applications.

Specifically, this is fine:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg(min)

but this would reproduce the error you are seeing:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg([min])

Investigation and PRs would certainly be welcome

Comment From: nathan-zymergen

Curious if there is any update on this. I just ran into this issue. I appreciate all the work being done on this great project!

Comment From: simonjayhawkins

closing as duplicate of #13217. ping me if I'm missing something.

Comment From: AnkithO-0

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

Comment From: realGhostFoxx

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

hero! hero! hero!

Comment From: IsmamHuda

Still would be good for this to get resolved all the same.

Comment From: vroomzel

Encountering the same with pandas=='1.3.4'

Comment From: SieSiongWong

Hello, today I ran below groupby() code and getting this error: ValueError: Cannot set a DataFrame with multiple columns to the single column max_date This is super strange. I run exactly the same code in another pc and getting the expected result without any error but then on another pc I'm getting this error. I used to run this code 1 month ago on both PCs and no issue at all and I have used the same code to run for about a year now without any error. Is this a bug being introduced to Pandas recently? The version of Pandas I use of getting this error is 1.5.1 but the version does not generate this error is pandas version 1.4. On PC which has the pandas version 1.5.1, I need to set the as_index=True in order to avoid getting this error but still this is super super strange because I use the same code every week. If anyone can tell what is happening, really appreciate it.

df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)

Pandas Using agg with groupy, as_index=False still returning group variable as index

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`