Pandas Assignment of subpart of data frame with multi index yields user surprise

from collections import OrderedDict
import random
import pandas as pd


pd.__version__

'0.20.3'

df = pd.DataFrame.from_dict(OrderedDict(
    [
        ((lv1, lv2, lv3), {'KG': random.randint(0, 100)})
        for lv1 in ['A', 'B']
        for lv2 in ['M', 'N', 'O']
        for lv3 in ['X', 'Y', 'Z']
    ]
))
df

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	16	71	27	8	83	0	100	72	100	42	26	21	91	5	36	98	96	52

df.loc[:, idx[:, :, 'X']]

	A			B
	M	N	O	M	N	O
	X	X	X	X	X	X
KG	44	83	-28	5	-31	44

df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1)

	A			B
	M	N	O	M	N	O
KG	44	83	-28	5	-31	44

idx = pd.IndexSlice
df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1))
df

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	NaN	71	27	NaN	83	0	NaN	72	100	NaN	26	21	NaN	5	36	NaN	96	52

This was unexpected. I realize that the indexes do not match, and hence this behavior, but I expect Pandas to throw an Exception, rather than silently do this

idx = pd.IndexSlice
df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1)).values
df

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	44	71	27	83	83	0	-28	72	100	5	26	21	-31	5	36	44	96	52

This was, of course, the behavior I was expecting

Comment From: gfyoung

@kghosesbg : Thanks for reporting this! Looks quite odd to me indeed. This is reproducible on master, so feel free to investigate and see what's going on here.

Comment From: jreback

This is expected, you are assigning mismatched levels. It is pretty tricky to actually figure out what you are doing is invalid. Sure you can make an assumption that you are assigning based on sub-levels, or that the levels don't match on the left and right sides, but this is pretty fragile. xref https://github.com/pandas-dev/pandas/issues/12343, https://github.com/pandas-dev/pandas/issues/7475, duplicate of https://github.com/pandas-dev/pandas/issues/10440

In [13]: df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1))

In [14]: df.loc[:, idx[:, :, 'X']]
Out[14]: 
     A           B        
     M   N   O   M   N   O
     X   X   X   X   X   X
KG NaN NaN NaN NaN NaN NaN

This is the correct method, essentially no alignment and a direct assignment.

In [16]: df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1)).values

In [17]: df
Out[17]: 
     A                                  B                                 
     M          N           O           M           N           O         
     X   Y   Z  X   Y   Z   X   Y   Z   X   Y   Z   X   Y   Z   X   Y    Z
KG  79  98  19  4  34  30 -52  34  86  29  70  41 -66  14  80 -58  42  100

This is very simliar in concept to the warning at the end of this section. If you wanted to make an example for MultiIndexing would be great.

http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

Comment From: jreback

@kghosesbg we need someone to collect the cases for MultiIndex and write/re-write the logic a bit. I am not opposed to raising in a case like this, but this requires some investigation. Would be great just to have the cases listed (and we can xfail them). Then tackle later. would you be interested in this?

Comment From: ghost

Hi @jreback I'm happy to help as needed, though it might go slowly. Could you please clarify the task. From you previous comment I understand that you'd like to add this example to the multi-indexing docs, which I can do. Is the next to collect reports of these cases and put them in the test suite?

Comment From: ghost

@jreback I'll probably also do this with my mild mannered private individual role which I'm tagging @kghose

Comment From: jreback

yep those are the tasks

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	16	71	27	8	83	0	100	72	100	42	26	21	91	5	36	98	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	NaN	71	27	NaN	83	0	NaN	72	100	NaN	26	21	NaN	5	36	NaN	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	44	71	27	83	83	0	-28	72	100	5	26	21	-31	5	36	44	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	16	71	27	8	83	0	100	72	100	42	26	21	91	5	36	98	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	NaN	71	27	NaN	83	0	NaN	72	100	NaN	26	21	NaN	5	36	NaN	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	44	71	27	83	83	0	-28	72	100	5	26	21	-31	5	36	44	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	16	71	27	8	83	0	100	72	100	42	26	21	91	5	36	98	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	NaN	71	27	NaN	83	0	NaN	72	100	NaN	26	21	NaN	5	36	NaN	96	52

	A									B
	M			N			O			M			N			O
	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z	X	Y	Z
KG	44	71	27	83	83	0	-28	72	100	5	26	21	-31	5	36	44	96	52