Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

df = pd.DataFrame({'idx_L1': ['Level_1'] * 10 + ['Level_2'] * 10 + ['Level_3'] * 10,  
                   'idx_L2': ([1] * 5 + [2] * 5 + [3] * 5) * 2,  
                   'vals': np.arange(30)})
df = df.set_index(['idx_L1', 'idx_L2'])
df.groupby(level=['idx_L1', 'idx_L2']).rolling(3).sum()

Results in: Pandas BUG: Groupby with rolling on MultiIndex duplicates index levels

Issue Description

When applying a rolling operation on a dataframe grouped by a multi-index, the indexes are duplicated in the resulting dataframe.

Notice that this bug has already been reported in the past: https://github.com/pandas-dev/pandas/issues/15350

Expected Behavior

The expected behavior is that the index is not duplicated.

This is the behavior when using apply instead of rolling directly, but this approach is slower compared to the first:

import numpy as np
import pandas as pd

df = pd.DataFrame({'idx_L1': ['Level_1'] * 10 + ['Level_2'] * 10 + ['Level_3'] * 10,  
                   'idx_L2': ([1] * 5 + [2] * 5 + [3] * 5) * 2,  
                   'vals': np.arange(30)})
df = df.set_index(['idx_L1', 'idx_L2'])
df.groupby(level=['idx_L1', 'idx_L2'], group_keys=False)[['vals']].apply(lambda x: x.rolling(3).sum())

Results in: Pandas BUG: Groupby with rolling on MultiIndex duplicates index levels

Installed Versions

INSTALLED VERSIONS ------------------ commit : 8dab54d6573f7186ff0c3b6364d5e4dd635ff3e7 python : 3.10.4.final.0 python-bits : 64 OS : Linux OS-release : 5.10.149-133.644.amzn2.x86_64 Version : #1 SMP Tue Oct 18 16:52:42 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.2 numpy : 1.23.2 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 65.6.3 pip : 22.3 Cython : 0.29.32 pytest : 7.1.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : 2.9.3 jinja2 : 3.1.2 IPython : 8.4.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.3 numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.9.1 snappy : None sqlalchemy : 1.4.40 tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : 2022.7

Comment From: phofl

Hi, thanks for your report.

This is a duplicate, see #47181 for example

Comment From: dvfariaf-bops

Hi, thanks for the quick response.

Now that I read the linked issues I understand better the purpose of keeping all multi indexes (e.g. do not remove an index date), but having duplicate indexes without any warning is somewhat unexpected and could lead to bugs.

IMO having a parameter that could control this (like group_keys) would be very useful.