In [1]:
import pandas as pd
from io import StringIO
data = """
code\tname\ttyp\tntf\n
A5411\tWD\tAF\t\n
A5411\tWD\tAF\t210194618\n
B5498\tSH\tNC\t\n
B5498\tSH\tNC\t210213014\n
"""
df = pd.read_table(StringIO(data))
In [2]: df.set_index(['name','code'])
Out[2]:
typ ntf
name code
WD A5411 AF NaN
A5411 AF 210194618.0
SH B5498 NC NaN
B5498 NC 210213014.0
Expected Output
I am expecting the output of In[2] should be something like Out[3]
In [3]: df.set_index(['name', 'code', 'typ'])
Out[3]:
ntf
name code typ
WD A5411 AF NaN
AF 210194618.0
SH B5498 NC NaN
NC 210213014.0
Problem description
Not all the levels of groups are displayed in sparse style (only show one row for all same values). For out[2], the column code
are also expected to be in that way
Output of pd.show_versions()
[paste the output of ``pd.show_versions()`` here below this line]
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Comment From: jreback
the innermost level is not sparsified as its almost alway unique. generally having a non-unique MI is not performant and not recommend (though it does work).
In [11]: df.assign(r=[0,1, 0, 1]).set_index(['name', 'code', 'r'])
Out[11]:
typ ntf
name code r
WD A5411 0 AF NaN
1 AF 210194618.0
SH B5498 0 NC NaN
1 NC 210213014.0
Comment From: hastelloy
@jreback thanks, it does make sense...