Pandas MultiIndex Bug Copying Values Incorrectly When Adding Values To Index

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(
    [
        ['A', np.nan, 1.23, 4.56],
        ['A', 'G', 1.23, 4.56],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
    if necessary_value not in df.index.get_level_values('pivot_1').tolist():
        print("Missing", necessary_value)

        df.at[(pivot_0, necessary_value), 'col_2'] = 0.0

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Fails: value of 1.23 from the first row in the df is copied. As of v0.22.0 this was np.nan

Problem description

When using the MultiIndex features of pandas, when an np.nan is in the index when new values are added to the DF then the values are not np.nan, but copied from the np.nan row.

This behavior shows for all versions v0.23.x, however is fine for 0.22.0.

Expected Output

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Pass, works in v0.22.0

Note this unexpected behavior does not show when the np.nan is not included in the index, nor for a single Index.

MultiIndex without `np.nan`


df = pd.DataFrame(
    [
        #['A', np.nan, 1.23, 4.56],  # Comment out the np.nan
        ['A', 'G', 1.23, 4.56],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)

pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
    if necessary_value not in df.index.get_level_values('pivot_1').tolist():
        print "Missing", necessary_value

        df.at[(pivot_0, necessary_value), 'col_2'] = 0.0

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Pass

Single Index with `np.nan`


df = pd.DataFrame(
    [
        [np.nan, 1.23, 4.56],
        ['G', 1.23, 4.56],
        ['D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'col_1', 'col_2'],
)
df.set_index(['pivot_0'], inplace=True)

necessary_pivot_0_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_0_values:
    if necessary_value not in df.index.get_level_values('pivot_0').tolist():
        print "Missing", necessary_value

        df.at[(necessary_value), 'col_2'] = 0.0

assert df.loc[('F')]['col_2'] == 0.0
assert pd.isnull(df.loc[('F')]['col_1'])

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 0, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.4 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.26.1 numpy: 1.12.1 scipy: 1.0.0 pyarrow: None xarray: None IPython: 5.4.1 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: gfyoung

How odd! git bisect investigation (if needed), patch, and PR are welcome!

Comment From: hksonngan

I catch KeyError at this line 656, this code commited 8cbee356. That exception make the set_value function error at 103

Comment From: mroeschke

The asserts look to work on master. The example could use a distillation to see what could be used as a unit test

In [4]: df = pd.DataFrame(
   ...:     [
   ...:         ['A', np.nan, 1.23, 4.56],
   ...:         ['A', 'G', 1.23, 4.56],
   ...:         ['A', 'D', 9.87, 10.54],
   ...:     ],
   ...:     columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
   ...: )
   ...: df.set_index(['pivot_0', 'pivot_1'], inplace=True)
   ...: pivot_0 = 'A'
   ...: necessary_pivot_1_values = ['D', 'E', 'F' ]
   ...: for necessary_value in necessary_pivot_1_values:
   ...:     if necessary_value not in df.index.get_level_values('pivot_1').tolist():
   ...:         print("Missing", necessary_value)
   ...:
   ...:         df.at[(pivot_0, necessary_value), 'col_2'] = 0.0
   ...:
   ...: assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
   ...: assert pd.isnull(df.loc[('A', 'F')]['col_1'])
Missing E
Missing F

Comment From: shteken

take