Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)
Issue Description
Hello, today I ran below groupby() code and getting this error: ValueError: Cannot set a DataFrame with multiple columns to the single column max_date This is super strange. I run exactly the same code in another pc and getting the expected result without any error but then on another pc I'm getting this error. I used to run this code 1 month ago on both PCs and no issue at all and I have used the same code to run for about a year now without any error. Is this a bug being introduced to Pandas recently? The version of Pandas I use of getting this error is 1.5.1 but the version does not generate this error is pandas version 1.4. On PC which has the pandas version 1.5.1, I need to set the as_index=True in order to avoid getting this error but still this is super super strange because I use the same code every week. If anyone can tell what is happening, really appreciate it.
df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)
Expected Behavior
Without error and create the max_date column.
Installed Versions
Version1.5.1 versus 1.4
Comment From: rhshadrach
Thanks for the report! Can you provide a reproducible example? This is likely the same issue as #49834 and will be fixed in 1.5.3.
Comment From: SieSiongWong
Thank you Richard. Below is the reproducible example.
import pandas as pd
d = {'Provider': ['RRT', 'RST', 'RRT', 'RWT', 'RPU'], 'Location': ['Macy Plaza', 'Marconi Park', 'Macy Plaza', 'Rockaway YMCA','Soundview Park'], 'Address': ['123 Avenue', '111 Street','123 Avenue', '545 Fulton Ave', '222 Howard Ave'], 'Open Hours':['8:00am-6.00pm', '8:00am-6.00pm', '8:00am-6.00pm', '8:00am-6.00pm', '8.;30am-7:00pm'], 'Test to Treat':['Y', 'Y', 'Y','Y','N'], 'End Date': pd.to_datetime(['12/5/2022','12/6/2022','12/9/2022','12/5/2022','12/8/2022'])}
df = pd.DataFrame(data=d)
df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)
Comment From: rhshadrach
Thanks - running this on main I get the following for df['max_date']
:
0 2022-12-09
1 2022-12-06
2 2022-12-09
3 2022-12-05
4 2022-12-08
Name: max_date, dtype: datetime64[ns]
If that's the expected output, this issue has been fixed and will be available when 1.5.3 is released.
Comment From: SieSiongWong
Thank you so much Richard. Yes, this is the expected result.