• [x] I have searched the [pandas] tag on StackOverflow for similar questions.

  • [x] I have asked my usage related question on StackOverflow.


Question about pandas

When using the | operation between two dataframes, the order of the operation can give different results if you have nan, or different index.

df_1 = pd.DataFrame([None], index=[1])
df_2 = pd.DataFrame([True], index=[1])

print(df_1 | df_2)
print(df_2 | df_1)

will give you different results:

       0
1  False
      0
1  True

The same things with different index:

df_1 = pd.DataFrame([True], index=[1])
df_2 = pd.DataFrame([True], index=[2])

print(df_1 | df_2)
print(df_2 | df_1)

will return

       0
1   True
2  False
       0
1  False
2   True

Is this a normal behavior or a bug? If it is a normal behavior, what is the logic behind it?

Thanks!

Comment From: mzeitlin11

Thanks for the report @ThomasDelorme! These both look suspicious - I think both are bugs. Think the root cause is along the lines of weirdness with truthy operators for object arrays - see the whole host of issues around #12863 in both numpy and pandas.

What looks to be the reason for the lack of commutativity in the first case is that the logic for computing the binary op hits None or True, which raises and falls back to the give the left operator (which then gets filled with False when the result is coerced to a bool type). So None or True -> None - > False and True or None -> True. This logic can be seen in https://github.com/pandas-dev/pandas/blob/e5f1f9c0d3d26c09bea89f12144bfe732fedb1ea/pandas/_libs/ops.pyx#L244-L250

Didn't look into second case yet, but I'd guess it's an identical issue -> since the indices don't match the frames will be reindexed before computing the binary op, giving the same NaN to bool comparison issues.

Further investigations into fixing this are welcome!

Comment From: corneliusroemer

This would be great if it could be fixed. It contradicts normal logic that NaN | True == False. If one is True, then or should yield True. Irrespective of what NaN is.

NaN | False is trickier, this could be either False or True, depending on what NaN is, so NaN seems appropriate. Or throwing an error, like numpy does.

This is probably a duplicate: #51267

Also, it would be good to rename the issue from QST: to BUG: as this clearly is a bug ;)