Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.MultiIndex.from_product([[1, 2], [3, 4]], names=['a', 'b']).to_frame()
idx1, _ = next(iter(df.groupby(level=[0])))
idx2, _ = next(iter(df.groupby(level=[0, 1])))
assert type(idx1) == type(idx2)  # Error

Issue Description

For consistency, when grouping using a single level that's specified as multiple levels (e.g. in a list) on a dataframe that has multiindex, the return types of the index should be consistent to the multiple-multiple levels. Same idea as the difference between df.iloc[0] and df.iloc[[0]].

Expected Behavior

assert idx1 == (1, )
assert idx2 == (1, 3)

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.9.12.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-139-generic Version : #156-Ubuntu SMP Fri Jan 20 17:27:18 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.22.0 pytz : 2022.1 dateutil : 2.8.2 setuptools : 61.2.0 pip : 22.3.1 Cython : 0.29.30 pytest : 7.1.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.5.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : 1.3.5 brotli : fastparquet : None fsspec : 2022.11.0 gcsfs : None matplotlib : 3.6.2 numba : 0.56.0 numexpr : 2.8.4 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 6.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.7.3 snappy : None sqlalchemy : 1.4.27 tables : 3.7.0 tabulate : 0.8.10 xarray : 2022.10.0 xlrd : 2.0.1 xlwt : None zstandard : None tzdata : None

Comment From: phofl

cc @rhshadrach Thoughts?

Comment From: rhshadrach

We recently did this for the by argument, +1 on doing it for level too.

Comment From: august-tengland

Take

Comment From: august-tengland

Hi! New Pandas contributor here (apologies for potentially stupid questions): Does this issue only consider the indexes that are "generated" when iterating? For iter(), we create them in group_keys_seq@/core/groupby/ops.py:

    def group_keys_seq(self):
        if len(self.groupings) == 1: # This is a problem
             return self.levels[0]
             ...

If I add the condition that self.groupings[0].level is None and omit the following in grouper.py (line 835):

         if isinstance(group_axis, MultiIndex):
          if is_list_like(level) and len(level) == 1:
              level = level[0]  # remove this 

I get the expected behavior, however, it will not affect any other potential ways to get the index which doesn't use groups_key_seq (if such exists).

Comment From: rhshadrach

Thanks for picking this up!

Does this issue only consider the indexes that are "generated" when iterating?

Yes.

I get the expected behavior, however, it will not affect any other potential ways to get the index which doesn't use groups_key_seq (if such exists).

I'm not 100% sure I follow, but if it's related to the above question, then I think this is okay. We're only looking to impact __iter__ here.

Comment From: august-tengland

Thanks for your response!

I'm not 100% sure I follow, but if it's related to the above question, then I think this is okay. We're only looking to impact iter here.

That's great, I think I can get a PR on this by next week.