Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
edit: formatting problem
Reproducible Example
import pandas as pd
not pd.NA
import pandas as pd
pd.NA and False
import pandas as pd
pd.NA or True
Issue Description
or
, and
and not
operators not correctly implemented for pd.NA. Changing the order even raises error:
>>> True and pd.NA
<NA>
>>> False and pd.NA
False
>>> pd.NA and pd.NA
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and True
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and False
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> True or pd.NA
True
>>> False and pd.NA
False
>>> pd.NA and pd.NA
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and True
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and False
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> not pd.NA
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
Expected Behavior
Same as R language:
> !NA
[1] NA
> NA | TRUE
[1] TRUE
> NA | FALSE
[1] NA
> NA & TRUE
[1] NA
> NA & FALSE
[1] FALSE
Installed Versions
Comment From: kostyafarber
Might take a look at this one
Comment From: kostyafarber
Is this the expected behaviour of pd.NA
as it's not meant to be used in boolean statements?
https://pandas.pydata.org/pandas-docs/version/1.0.0/user_guide/missing_data.html#logical-operations:~:text=This%20also%20means,missing%20values%20beforehand.
Comment From: vsbits
Not sure, but I don't think so. Logical operations should follow the rules of three-valued logic.
It raises an error only if pd.NA comes first in the comparison. And using |
instead of or
works. Seems inconsistent.
>>> True or pd.NA
True
>>> True | pd.NA
True
>>> pd.NA | True
True
>>> pd.NA or True
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas\_libs\missing.pyx", line 382, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
On the same line, I guess not NA
should return NA
. I know R works that way, but it has a logical
class (TRUE/FALSE/NA), which is different from python bool
(True, False).
Comment From: jorisvandenbossche
The core of the issue here is related to multiple things:
1) We currently decided to raise an error when pd.NA
is evaluated in a boolean context (bool(pd.NA)
) -> https://github.com/pandas-dev/pandas/issues/38224
2) The and
and or
keywords will call bool(..)
on the left side value, which can thus cause this error to bubble up, and there is no way to override this behaviour. And so you get inconsistent behaviour depending on whether pd.NA is the left or right side argument.
3) On the other hand, the |
and &
operators can be overridden for custom behaviour on you object (using __and__
and __or__
, and that is what we do for pd.NA
), so we can provide consistent behaviour for those operations.