Code Sample, a copy-pastable example if possible
df = pd.DataFrame(
[
['A', np.nan, 1.23, 4.56],
['A', 'G', 1.23, 4.56],
['A', 'D', 9.87, 10.54],
],
columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
if necessary_value not in df.index.get_level_values('pivot_1').tolist():
print("Missing", necessary_value)
df.at[(pivot_0, necessary_value), 'col_2'] = 0.0
assert df.loc[('A', 'F')]['col_2'] == 0.0 # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1']) # Fails: value of 1.23 from the first row in the df is copied. As of v0.22.0 this was np.nan
Problem description
When using the MultiIndex features of pandas, when an np.nan
is in the index when new values are added to the DF then the values are not np.nan
, but copied from the np.nan
row.
This behavior shows for all versions v0.23.x, however is fine for 0.22.0.
Expected Output
assert df.loc[('A', 'F')]['col_2'] == 0.0 # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1']) # Pass, works in v0.22.0
Note this unexpected behavior does not show when the np.nan
is not included in the index, nor for a single Index.
MultiIndex without np.nan
df = pd.DataFrame(
[
#['A', np.nan, 1.23, 4.56], # Comment out the np.nan
['A', 'G', 1.23, 4.56],
['A', 'D', 9.87, 10.54],
],
columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
if necessary_value not in df.index.get_level_values('pivot_1').tolist():
print "Missing", necessary_value
df.at[(pivot_0, necessary_value), 'col_2'] = 0.0
assert df.loc[('A', 'F')]['col_2'] == 0.0 # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1']) # Pass
Single Index with np.nan
df = pd.DataFrame(
[
[np.nan, 1.23, 4.56],
['G', 1.23, 4.56],
['D', 9.87, 10.54],
],
columns=['pivot_0', 'col_1', 'col_2'],
)
df.set_index(['pivot_0'], inplace=True)
necessary_pivot_0_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_0_values:
if necessary_value not in df.index.get_level_values('pivot_0').tolist():
print "Missing", necessary_value
df.at[(necessary_value), 'col_2'] = 0.0
assert df.loc[('F')]['col_2'] == 0.0
assert pd.isnull(df.loc[('F')]['col_1'])
Output of pd.show_versions()
Comment From: gfyoung
How odd! git bisect
investigation (if needed), patch, and PR are welcome!
Comment From: hksonngan
I catch KeyError at this line 656, this code commited 8cbee356. That exception make the set_value function error at 103
Comment From: mroeschke
The asserts look to work on master. The example could use a distillation to see what could be used as a unit test
In [4]: df = pd.DataFrame(
...: [
...: ['A', np.nan, 1.23, 4.56],
...: ['A', 'G', 1.23, 4.56],
...: ['A', 'D', 9.87, 10.54],
...: ],
...: columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
...: )
...: df.set_index(['pivot_0', 'pivot_1'], inplace=True)
...: pivot_0 = 'A'
...: necessary_pivot_1_values = ['D', 'E', 'F' ]
...: for necessary_value in necessary_pivot_1_values:
...: if necessary_value not in df.index.get_level_values('pivot_1').tolist():
...: print("Missing", necessary_value)
...:
...: df.at[(pivot_0, necessary_value), 'col_2'] = 0.0
...:
...: assert df.loc[('A', 'F')]['col_2'] == 0.0 # Pass
...: assert pd.isnull(df.loc[('A', 'F')]['col_1'])
Missing E
Missing F
Comment From: shteken
take