When I tried to generate multiindex with np.nan
or None
from array, I got:
import pandas as pd
import numpy as np
from pandas import MultiIndex as MultiIndex
from pandas import Index as Index
m = MultiIndex.from_arrays([[np.nan, np.nan, 'one', 'one', 'two', 'two', 'All'],
['dull', 'shiny', 'dull', 'shiny', 'dull', 'shiny', '']],
names=['b', 'c'])
print m
[Output]
MultiIndex(levels=[[u'All', u'one', u'two'], [u'', u'dull', u'shiny']],
labels=[[-1, -1, 1, 1, 2, 2, 0], [1, 2, 1, 2, 1, 2, 0]],
names=[u'b', u'c'])
While it should be (and can be achieved from MultiIndex constructor):
m = MultiIndex(levels=[Index(['All', np.nan, 'one', 'two']),
Index(['', 'dull', 'shiny'])],
labels=[[1, 1, 2, 2, 3, 3, 0],
[1, 2, 1, 2, 1, 2, 0]], names=['b', 'c'])
print m
[Output]
MultiIndex(levels=[[u'All', nan, u'one', u'two'], [u'', u'dull', u'shiny']],
labels=[[1, 1, 2, 2, 3, 3, 0], [1, 2, 1, 2, 1, 2, 0]],
names=[u'b', u'c'])
The problems seems to be here
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: 74e20a0c4146919a82312c1027dcb57b48b613e3
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.0+300.g74e20a0c4
nose: 1.3.3
pip: 9.0.1
setuptools: 32.1.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.14.0
statsmodels: 0.5.0
xarray: 0.7.2
IPython: 5.1.0
sphinx: 1.4.1
patsy: 0.2.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: 2.3.5
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: None
bs4: None
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_datareader: None
Comment From: jreback
these are missing values and as such are represented with -1 labels.
In [18]: pd.factorize(['foo', 'bar', np.nan, None])
Out[18]: (array([ 0, 1, -1, -1]), array(['foo', 'bar'], dtype=object))
Comment From: OXPHOS
But then I assume the two ways to construct MultiIndex
should give the same output?
Comment From: jreback
@OXPHOS not really sure what you are getting at. your second construction is valid I guess. But these are not the same. IF you ever (and not a good idea), directly use the MultiIndex
constructor you in charge of the labels, so you have to account for missing values (which you are not here).