Code Sample, a copy-pastable example if possible
Code sample:
# Import packages
import pandas as pd
import numpy as np
# Set up test DataFrame
test_array = np.arange(50) + 100
test_matrix = test_array.reshape((10,5))
test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
test_df.loc[0:5,'shouldnt be index'] = 3
test_df.loc[5:8,'shouldnt be index'] = 4
# groupby and agg
end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
print(end_result)
execution:
>>> # Import packages
... import pandas as pd
>>> import numpy as np
>>> # Set up test DataFrame
... test_array = np.arange(50) + 100
>>> test_matrix = test_array.reshape((10,5))
>>> test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
>>> # Make groupby data more grouped
... test_df.loc[0:5,'shouldnt be index'] = 3
>>> test_df.loc[5:8,'shouldnt be index'] = 4
>>> # groupby and agg
... end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
>>> print(end_result)
1 2 3 4
min max count min max count min max count min max count
shouldnt be index
3 101 121 5 102 122 5 103 123 5 104 124 5
4 126 141 4 127 142 4 128 143 4 129 144 4
145 146 146 1 147 147 1 148 148 1 149 149 1
Problem description
I'm trying to use groupby with as_index=False and then do an aggregate statement. I included an example where the groupby variable ends up as an index rather than staying as a column. My understanding is that this should result in the groupby variable being a column and not an index (as below), but perhaps I am mistaken.
This is my first time creating an issue, so my apologies if this is operator error or I didn't include important information. Please let me know if this is the case.
Maybe this is related to #22546?
Expected Output
You can see what the result should be when using "reset_index"
>>> end_result.reset_index()
shouldnt be index 1 2 3 4
min max count min max count min max count min max count
0 3 101 121 5 102 122 5 103 123 5 104 124 5
1 4 126 141 4 127 142 4 128 143 4 129 144 4
2 145 146 146 1 147 147 1 148 148 1 149 149 1
Output of pd.show_versions()
Comment From: WillAyd
Thanks for the report. I think the problem here is a conflict between the as_index
keyword and how we are piecing together the result of multiple agg
function applications.
Specifically, this is fine:
end_result = test_df.groupby('shouldnt be index',as_index=False).agg(min)
but this would reproduce the error you are seeing:
end_result = test_df.groupby('shouldnt be index',as_index=False).agg([min])
Investigation and PRs would certainly be welcome
Comment From: nathan-zymergen
Curious if there is any update on this. I just ran into this issue. I appreciate all the work being done on this great project!
Comment From: simonjayhawkins
closing as duplicate of #13217. ping me if I'm missing something.
Comment From: AnkithO-0
Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)
Comment From: realGhostFoxx
Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)
hero! hero! hero!
Comment From: IsmamHuda
Still would be good for this to get resolved all the same.
Comment From: vroomzel
Encountering the same with pandas=='1.3.4'
Comment From: SieSiongWong
Hello, today I ran below groupby() code and getting this error: ValueError: Cannot set a DataFrame with multiple columns to the single column max_date This is super strange. I run exactly the same code in another pc and getting the expected result without any error but then on another pc I'm getting this error. I used to run this code 1 month ago on both PCs and no issue at all and I have used the same code to run for about a year now without any error. Is this a bug being introduced to Pandas recently? The version of Pandas I use of getting this error is 1.5.1 but the version does not generate this error is pandas version 1.4. On PC which has the pandas version 1.5.1, I need to set the as_index=True in order to avoid getting this error but still this is super super strange because I use the same code every week. If anyone can tell what is happening, really appreciate it.
df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)