• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

In [1]: import numpy as np 
   ...: import pandas as pd                                                                                                   

In [2]: months = ['Jan', 'Feb', 'Mar', 'Apr'] 
   ...: colors = ['Red', 'Green', 'Blue'] 
   ...: directions = ['Up', 'Down']                                                                                           

In [3]: df = pd.DataFrame( 
   ...:     np.random.random(24), 
   ...:     index=pd.MultiIndex.from_product([months, pd.CategoricalIndex(colors), directions], names=['month', 'color', 'direction']), 
   ...:     columns=['value'] 
   ...: )                                                                                                                     

In [4]: df.groupby(['month', 'color'], sort=False).sum()                                                                      
Out[4]: 
                value
month color          
Apr   Blue   1.369042
      Green  1.430864
      Red    0.346581
Feb   Blue   1.068234
      Green  0.888100
      Red    0.708154
Jan   Blue   1.119714
      Green  0.731628
      Red    0.624529
Mar   Blue   0.605137
      Green  1.400212
      Red    0.871417

Problem description

When grouping two columns with categorical data and using sort=False, the groups are sorted anyway. If the index was not Categorical, then the output is not sorted as expected.

Expected Output

I was hoping to get this output

Jan   Red    0.624529
      Green  0.731628
      Blue   1.119714
Feb   Red    0.708154
      Green  0.888100
      Blue   1.068234
Mar   Red    0.871417
      Green  1.400212
      Blue   0.605137
Apr   Red    0.346581
      Green  1.430864
      Blue   1.369042

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.7.11.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-77-generic Version : #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8 pandas : 1.3.0 numpy : 1.17.5 pytz : 2020.1 dateutil : 2.8.1 pip : 21.1.2 setuptools : 41.2.0 Cython : 0.29.21 pytest : 6.1.1 hypothesis : None sphinx : 3.3.1 blosc : 1.8.3 feather : None xlsxwriter : None lxml.etree : 4.4.3 html5lib : None pymysql : None psycopg2 : 2.7.7 (dt dec pq3 ext lo64) jinja2 : 2.10.3 IPython : 7.14.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : 0.4.1 gcsfs : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None numba : 0.51.2

Comment From: samukweku

@dangillet Is the categorical index ordered?

Comment From: dangillet

@samukweku No, in this example the categorical index is not ordered.

But what is really problematic in my opinion is that the first level month does not maintain its original order, but gets sorted lexicographically.

If you were to take my code above, but not make the color index a CategoricalIndex, then the sort=False is respected.

In [11]: df = pd.DataFrame(  
    ...:    np.random.random(24),  
    ...:    index=pd.MultiIndex.from_product([months, colors, directions], names=['month', 'color', 'direction']),  
    ...:    columns=['value'] 
    ...: )                                                                                                                    

In [12]: df.groupby(['month', 'color'], sort=False).sum()                                                                     
Out[12]: 
                value
month color          
Jan   Red    1.502055
      Green  1.815924
      Blue   0.756490
Feb   Red    1.231979
      Green  1.481169
      Blue   1.071179
Mar   Red    0.923146
      Green  0.386611
      Blue   1.169264
Apr   Red    1.770440
      Green  0.281132
      Blue   0.495125

Comment From: dangillet

Well done to everyone involved in fixing this issue :medal_sports: