Pandas BUG: MultiIndex set_levels is not symmetrical with get_level_values

Code Sample

import pandas as pd
import io

text = """
A B C
1 1 1
1 2 2
2 1 3
2 2 4
"""
df = pd.read_csv(io.StringIO(text), delimiter = ' ')
# Note the output of get_level_values is the list of all values [1, 1, 2, 2]
print(df.index.get_level_values('A').tolist())
df.set_index(['A','B'], inplace = True)
df.index.set_levels(df.index.get_level_values('A').map(lambda x: x * 2), level='A', inplace=True)
print(df.index.get_level_values('A').tolist())
# outputs [2, 2, 2, 2]

Expected Output

[2, 2, 4, 4]

# If I re-order the input as:
text = """
A B C
1 1 1
2 1 2
1 2 3
2 2 4
"""
# it works as I expected giving [2, 4, 2, 4]

Is this a bug or expected behaviour?

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.5.1.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None

pandas: 0.18.1 nose: 1.3.7 pip: 8.1.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: 0.6.1 xarray: None IPython: 4.1.2 sphinx: 1.3.1 patsy: 0.4.0 dateutil: 2.5.1 pytz: 2016.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: 0.7.5.None psycopg2: None jinja2: 2.8 boto: 2.39.0 pandas_datareader: None

Comment From: jreback

duplicate of #13754 and PR in #14236

Comment From: danio

@jreback I don't think from the description of #13754 and the code changes in #14236 that it is an exact duplicate of those. I will try testing it with the suggested change and see if that changes the behaviour.

Comment From: danio

@jreback I have confirmed it is not a duplicate. #13754 is about set_levels still changing the index even when verification fails. This issue is not about that.

I have realised that set_levels and get_level_values should not necessarily be symmetrical. What I think is missing from pandas is a get_levels function that would be symmetrical to get_levels (the levels property is not the same as you cannot use labels).

Failing that, I have found that the workaround for this is to replace

df.index.get_level_values(level)

with

df.index.levels[df.index.names.index(level)]

in my code, i.e.

level_index = df.index.names.index('A')
df.index.set_levels(df.index.levels[level_index].map(lambda x: x * 2), level='A', inplace=True)

Comment From: bkandel

@danio I agree that this issue is not a duplicate of https://github.com/pydata/pandas/issues/13754, but I think that it's quite close to this description: https://github.com/pydata/pandas/issues/13741#issuecomment-248009216 Basically the issue is that there's no way to directly set the equivalent of the column names in a multiindex. I.e. where you would be used to saying df.columns = ['A', 'B', 'C'] or df.index = ['A', 'B', 'C'], you can't say df.set_columns(['A', 'B', 'C'], level=0).

Comment From: danio

@bkandel, yes, the example there is quite convoluted but I think this is caused by the same underlying issue as #13741. I can't see any options to change the duplicate setting of this issue, probably I don't have the permissions?

It feels to me that MutilIndex.set_levels is exposing too much of the underlying representation of the index, and it should be replaced by a new function with an interface more like get_level_values. That's probably not a discussion for the issue tracker though.

Comment From: bkandel-picwell

@danio Yes, I agree that exposing set_levels and set_labels but not set_level_values (like what get_level_values does) makes it harder for users to see the functionality they want. I think that should be raised as a separate issue. I'll spend a bit of time to make sure exactly what functionality is needed and then file an issue.

Pandas BUG: MultiIndex set_levels is not symmetrical with get_level_values

Code Sample

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

output of `pd.show_versions()`