Code Sample
Do a crosstab
with an ordered categorical variable, with the order being different than the alphabetic default:
df = pd.DataFrame({'First': ['B', 'B', 'C', 'A', 'B', 'C'],
'Second': ['C', 'B', 'B', 'B', 'C', 'A']})
df.First = df.First.astype(CategoricalDtype(ordered=True))
# Default order is alphabetic
df.First.cat.categories
>> Index(['A', 'B', 'C'], dtype='object')
# Change the order
df.First = df.First.cat.reorder_categories(['C', 'A', 'B'])
df.First.cat.categories
>> Index(['C', 'A', 'B'], dtype='object')
# Do a simple crosstab with margins
pd.crosstab(df.First, df.Second, margins=True)
>> Second A B C All
>> First
>> C 1 1 0 1
>> A 0 1 0 3
>> B 0 1 2 2
>> All 1 3 2 6
Problem description
The margins are showing the wrong values. In fact, they're showing the values that would be expected from the default ordering of the categories. This might be related to #20496, but the output here surely looks like a bug
Output of pd.show_versions()
Comment From: WillAyd
Thanks for the report. Investigation and PRs are always welcome
Comment From: hwalinga
I can confirm this is now fixed on the master.
>>> pd.crosstab(df.First, df.Second, margins=True)
Second A B C All
First
C 1 1 0 2
A 0 1 0 1
B 0 1 2 3
All 1 3 2 6
So, this can be closed.
Comment From: aidoskanapyanov
Is this resolved or tests are still needed?
Comment From: aidoskanapyanov
take
Comment From: aidoskanapyanov
I can confirm this is now fixed on the master.
```
pd.crosstab(df.First, df.Second, margins=True) Second A B C All First
C 1 1 0 2 A 0 1 0 1 B 0 1 2 3 All 1 3 2 6 ```So, this can be closed.
Hi @hwalinga! I've added a PR #52645 with the unit test for this example. It passes on the main branch.