Pandas BUG?: getitem with a MultiIndex returns a Series only when the lower level is ""

xref #46944

columns1 = pd.MultiIndex.from_tuples([("a", "a2"), ("b", "c")])
df1 = pd.DataFrame([[1, 2]], columns=columns1)
print(df1["a"])
#    a2
# 0   1

columns2 = pd.MultiIndex.from_tuples([("a", ""), ("b", "c")])
df2 = pd.DataFrame([[1, 2]], columns=columns2)
print(df2["a"])
# 0    1
# Name: a, dtype: int64

The first case produces a DataFrame, whereas the second case produces a Series. I don't think this is intentional. This gives rise to a difference in DataFrameGroupBy._selected_obj and DataFrameGroupBy._obj_with_exclusions which can lead to erroneous results (#50804 is one example).

Currently, df2.groupby("a") is allowed whereas df1.groupby("a") raises. So returning a DataFrame in the 2nd case will resolve the groupby inconsistency as well.

One can do df1.groupby(("a", "a2")) successfully, so I don't think there is a worry about making certain ops not possible.

cc @phofl for any thoughts

Comment From: phofl

Not really any thoughts, but we had an issue about this as well. I looked into why this was done the way it's done and it has been around forever and is tested explicitly

Comment From: rhshadrach

I think this could be supported in groupby with just a few lines, but I do find it somewhat odd behavior. I'm guessing an empty string enables having some column behave as if they are multiindexed whereas other columns not (or just with fewer levels). I didn't see it mentioned anywhere in the docs, but could have missed it.

I'd support removing this special case behavior, but not going to push for it. It seems like added complexity that doesn't offer much in the way of benefits (but maybe it does and I just don't see it).

Comment From: rhshadrach

The recommendation of using _obj_with_exclusions instead of _selected_obj in https://github.com/pandas-dev/pandas/issues/46944#issuecomment-1409619368 has the added benefit of handling this behavior without any other change to groupby.