Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

def data(frame):
    return frame.DATA

date_range = np.arange(
    np.datetime64("2020-01-01"), np.datetime64("2020-01-03"), np.timedelta64(1, "m")
)
idx = pd.MultiIndex.from_product([['ABC', 'DDD'], date_range], names=['Symbol','Date'])
df = pd.DataFrame(1, idx, ['DATA'])
result = df.groupby("Symbol").apply(data)

'''
>> Symbol  Symbol  Date               
ABC     ABC     2020-01-01 00:00:00    1
                2020-01-01 00:01:00    1
                2020-01-01 00:02:00    1
                2020-01-01 00:03:00    1
                2020-01-01 00:04:00    1
                                      ..
DDD     DDD     2020-01-02 23:55:00    1
                2020-01-02 23:56:00    1
                2020-01-02 23:57:00    1
                2020-01-02 23:58:00    1
                2020-01-02 23:59:00    1
Name: DATA, Length: 5760, dtype: int64
'''

''' This will result in TypeError: incompatible index of inserted column with frame index '''
df["New_column"] = result

Issue Description

Possibly related to #51014, unable to test, missing MRE.

When calling groupby and apply, the returned mi has one level duplicated, temporary workaround is using droplevel on the duplicate index level.

This bug has been introduced in the more recent versions, the latest versions 4-5 months ago did not have this behaviour.

Expected Behavior

Successfully inserting the column back to df.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 031e9bbc3a9db4ab78be8b477b15d7173a1f625e python : 3.10.6.final.0 python-bits : 64 OS : Linux OS-release : 5.15.85-1-MANJARO Version : #1 SMP PREEMPT Wed Dec 21 21:15:06 UTC 2022 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8 pandas : 2.0.0.dev0+1523.g031e9bbc3 numpy : 1.23.5 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 66.1.1 pip : 23.0 Cython : 3.0.0a11 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.7.0 pandas_datareader: None bs4 : 4.11.2 bottleneck : None brotli : 1.0.9 fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.3 numba : 0.56.4 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 zstandard : None tzdata : None qtpy : 2.2.1 pyqt5 : None

Comment From: StepfenShawn

Because the param as_index defaults to true, so the mi has one level duplicated, we can set it to False:

result = df.groupby("Symbol", as_index = False).apply(data)
Symbol  Date
0  ABC     2020-01-01 00:00:00    1
           2020-01-01 00:01:00    1
           2020-01-01 00:02:00    1
           2020-01-01 00:03:00    1
           2020-01-01 00:04:00    1
                                 ..
1  DDD     2020-01-02 23:55:00    1
           2020-01-02 23:56:00    1
           2020-01-02 23:57:00    1
           2020-01-02 23:58:00    1
           2020-01-02 23:59:00    1
Name: DATA, Length: 5760, dtype: int64

Comment From: misantroop

I needed to set group_keys=False, this was changed in 1.5.0, apologies for the oversight. as_index=False doesn't help as it still creates a an additional index level.