Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)

Issue Description

Hello, today I ran below groupby() code and getting this error: ValueError: Cannot set a DataFrame with multiple columns to the single column max_date This is super strange. I run exactly the same code in another pc and getting the expected result without any error but then on another pc I'm getting this error. I used to run this code 1 month ago on both PCs and no issue at all and I have used the same code to run for about a year now without any error. Is this a bug being introduced to Pandas recently? The version of Pandas I use of getting this error is 1.5.1 but the version does not generate this error is pandas version 1.4. On PC which has the pandas version 1.5.1, I need to set the as_index=True in order to avoid getting this error but still this is super super strange because I use the same code every week. If anyone can tell what is happening, really appreciate it.

df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)

Expected Behavior

Without error and create the max_date column.

Installed Versions

Version1.5.1 versus 1.4

Comment From: rhshadrach

Thanks for the report! Can you provide a reproducible example? This is likely the same issue as #49834 and will be fixed in 1.5.3.

Comment From: SieSiongWong

Thank you Richard. Below is the reproducible example.

import pandas as pd

d = {'Provider': ['RRT', 'RST', 'RRT', 'RWT', 'RPU'], 'Location': ['Macy Plaza', 'Marconi Park', 'Macy Plaza', 'Rockaway YMCA','Soundview Park'], 'Address': ['123 Avenue', '111 Street','123 Avenue', '545 Fulton Ave', '222 Howard Ave'], 'Open Hours':['8:00am-6.00pm', '8:00am-6.00pm', '8:00am-6.00pm', '8:00am-6.00pm', '8.;30am-7:00pm'], 'Test to Treat':['Y', 'Y', 'Y','Y','N'], 'End Date': pd.to_datetime(['12/5/2022','12/6/2022','12/9/2022','12/5/2022','12/8/2022'])}

df = pd.DataFrame(data=d)

df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)

Comment From: rhshadrach

Thanks - running this on main I get the following for df['max_date']:

0   2022-12-09
1   2022-12-06
2   2022-12-09
3   2022-12-05
4   2022-12-08
Name: max_date, dtype: datetime64[ns]

If that's the expected output, this issue has been fixed and will be available when 1.5.3 is released.

Comment From: SieSiongWong

Thank you so much Richard. Yes, this is the expected result.