Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
def data(frame):
return frame.DATA
date_range = np.arange(
np.datetime64("2020-01-01"), np.datetime64("2020-01-03"), np.timedelta64(1, "m")
)
idx = pd.MultiIndex.from_product([['ABC', 'DDD'], date_range], names=['Symbol','Date'])
df = pd.DataFrame(1, idx, ['DATA'])
result = df.groupby("Symbol").apply(data)
'''
>> Symbol Symbol Date
ABC ABC 2020-01-01 00:00:00 1
2020-01-01 00:01:00 1
2020-01-01 00:02:00 1
2020-01-01 00:03:00 1
2020-01-01 00:04:00 1
..
DDD DDD 2020-01-02 23:55:00 1
2020-01-02 23:56:00 1
2020-01-02 23:57:00 1
2020-01-02 23:58:00 1
2020-01-02 23:59:00 1
Name: DATA, Length: 5760, dtype: int64
'''
''' This will result in TypeError: incompatible index of inserted column with frame index '''
df["New_column"] = result
Issue Description
Possibly related to #51014, unable to test, missing MRE.
When calling groupby and apply, the returned mi has one level duplicated, temporary workaround is using droplevel on the duplicate index level.
This bug has been introduced in the more recent versions, the latest versions 4-5 months ago did not have this behaviour.
Expected Behavior
Successfully inserting the column back to df.
Installed Versions
Comment From: StepfenShawn
Because the param as_index
defaults to true, so the mi has one level duplicated, we can set it to False:
result = df.groupby("Symbol", as_index = False).apply(data)
Symbol Date
0 ABC 2020-01-01 00:00:00 1
2020-01-01 00:01:00 1
2020-01-01 00:02:00 1
2020-01-01 00:03:00 1
2020-01-01 00:04:00 1
..
1 DDD 2020-01-02 23:55:00 1
2020-01-02 23:56:00 1
2020-01-02 23:57:00 1
2020-01-02 23:58:00 1
2020-01-02 23:59:00 1
Name: DATA, Length: 5760, dtype: int64
Comment From: misantroop
I needed to set group_keys=False
, this was changed in 1.5.0, apologies for the oversight.
as_index=False
doesn't help as it still creates a an additional index level.