-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code here
import pandas as pd
df1 = pd.DataFrame({'x': [i for i in range(100)], 'y': [i for i in range(100)],
'z': [i for i in range(100)], 'd': [i for i in range(100)]})
df2 = df1.astype({"x":'category', "y":'category', "z":'category'})
df3 = df2.iloc[:10, :].groupby(['z', 'x'], observed=True).agg({'d': 'sum'})
df4 = df2.iloc[90:, :].groupby(['z', 'x', 'y'], observed=True).agg({'d': 'sum'})
pd.merge(df3, df4, left_index=True, right_index=True,
how='outer').index.to_frame().dtypes
Problem description
I have 2 data frames, df3, and df4. Index of df3 is z, x and index of df4 is z, x, y. All x, y, z are categorical, and index x is of the same category-dtype in df3 as in df4, since df3 and df4 are created from groupby on the same data frame df2. For the same reason, index z is of the same category-dtype in df3 as in df4. So, I think after the merge, index x and z should stay as category. But they are not, this is what I get:
z int64 x int64 y category dtype: object
Expected Output
z category x category y category dtype: object
Output of pd.show_versions()
Comment From: arw2019
This seems like a duplicate of #25412
Would you be interested in submitting a PR? Reading through #25412 there is interest in getting this to work but no one has got to it yet
Comment From: jbrockmendel
I'm seeing category-dtype results on main. Could use a test (or determine if one already exists)
Comment From: agodinez01
take