This is not a real 'bug' maybe? But it can be annoying.
xref #12614. Independent of issue #12642.
Code Sample
a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)
res_dropnaF = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna=False)
res_dropnaT = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna=True)
print res_dropnaF, '\n\n', res_dropnaT
Results:
res_dropnaF
shows nocolumns.names
(because they're removed) while res_dropnaT
does.
one two
dull shiny dull shiny
a
bar 1 0 1 0
foo 2 0 1 2
b one two
c dull dull shiny
a
bar 1 1 0
foo 2 1 2
Expected Output
Extra option or consistency on column names.
Should be fixed by working on pivot_table(): if not dropna
output of pd.show_versions()
INSTALLED VERSION:
commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.17.1 nose: 1.3.3 pip: 8.1.0 setuptools: 18.3.1 Cython: 0.23.4 numpy: 1.10.1 scipy: 0.14.0 statsmodels: 0.5.0 IPython: 2.1.0 sphinx: None patsy: 0.2.1 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.3.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None Jinja2: None
Comment From: jreback
dupe of #12133
Comment From: jreback
fix in #12327 (though that appears stalled)
Comment From: jreback
@OXPHOS so happy to have that fixed up
Comment From: OXPHOS
@jreback Aha good to know! I am wondering whether we still need the fix in #12643 ? I can test later if necessary.
Comment From: jreback
maybe not let's see if the other PR fixes btw happy to have j pick up that PR