Summary of issue
By the name of it, I expected using the 'dropna=False' option with crosstab would have given me results that include rows for which my column value is missing. However, I instead found that missing values continue to be ignored.
Code Sample
df = pd.DataFrame({'geography': ['tropic', 'tropic', 'desert', None, 'tundra', 'tundra'],
'dragons': ['many', None, 'few', 'few', 'many', None]})
print(df)
pd.crosstab(df.geography, df.dragons)
pd.crosstab(df.geography, df.dragons, dropna=False)
pd.crosstab(df.dragons, df.geography, dropna=False)
Expected Output
geography NA desert tropic tundra
dragons
NA 0 0 1 1
few 1 1 0 0
many 0 0 1 1
Actual output
In [210]: print(df)
dragons geography
0 many tropic
1 None tropic
2 few desert
3 few None
4 many tundra
5 None tundra
In [211]: pd.crosstab(df.geography, df.dragons)
Out[211]:
dragons few many
geography
desert 1 0
tropic 0 1
tundra 0 1
In [212]: pd.crosstab(df.geography, df.dragons, dropna=False)
Out[212]:
dragons few many
geography
desert 1 0
tropic 0 1
tundra 0 1
In [213]: pd.crosstab(df.dragons, df.geography, dropna=False)
Out[213]:
geography desert tropic tundra
dragons
few 1 0 0
many 0 1 1
This work-around gives me what I expected.
pd.crosstab(df.dragons.fillna('NA'), df.geography.fillna('NA'), dropna=False)
I'd hate for others to also not realize that the NAs are being dropped from the output, even when dropna=False. Perhaps altering the documentation or the behavior of this function would help.
Comment From: sinhrks
Thanks for the report. Dupe with #10772.