• [X ] I have checked that this issue has not already been reported. (sorry if I missed it)
  • [X ] I have confirmed this bug exists on the latest version of pandas. (present on 1.2.3)

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Set up
import pandas as pd
str_serie = pd.Series(["hello", None, "hi"]) # Note the None
true_serie = pd.Series([True] * 3)

print("1st case")
print(str_serie.str.contains('h') | true_serie) # prints True, False, True

print("2nd case")
print(true_serie | str_serie.str.contains('h')) # prints True, True, True

Problem description

  • the commutative property of the or or | isn't respected.
  • in the 1st case, pandas seems to stop evaluating the boolean expression at the string comparison if it encounters a null type.

Expected Output

In both cases, the output should be True, True, True.
If pandas were to still ignore boolean expressions after a str operation on a null type, then it should at least print some warnings about it.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : f2c8480af2f25efdbd803218b9d87980f416563e python : 3.9.2.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18362 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : French_Switzerland.1252 pandas : 1.2.3 numpy : 1.20.1 pytz : 2021.1 dateutil : 2.8.1 pip : 21.0.1 setuptools : 49.2.1 Cython : None pytest : 6.2.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 7.21.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : 0.8.7 fastparquet : None gcsfs : None matplotlib : 3.3.4 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.1 sqlalchemy : 1.4.0 tables : None tabulate : 0.8.9 xarray : None xlrd : None xlwt : None numba : None

Comment From: VodopaDev

In fact, pandas seems to stop evalutating boolean expressions only for rows with a null type string.

str_serie = pd.Series(["1", None, "2"])
true_serie = pd.Series([True] * 3)
(str_serie.str.contains("1")) | true_serie # prints True, False, True

Comment From: mzeitlin11

Thanks for the report @VodopaDev! Can confirm on master, but interestingly this works for the string type:

str_serie = pd.Series(["1", None, "2"], dtype='string')
true_serie = pd.Series([True] * 3)
print((str_serie.str.contains("1")) | true_serie)  # prints True, True, True

Investigations welcome to fix!

Comment From: asishm

I guess this can be summarized into

In [119]: pd.Series([True, False, np.nan, None, pd.NA]) | pd.Series([True] * 5)
Out[119]:
0     True
1     True
2    False
3    False
4     True
dtype: bool

In [120]: pd.Series([True] * 5) | pd.Series([True, False, np.nan, None, pd.NA])
Out[120]:
0    True
1    True
2    True
3    True
4    True
dtype: bool

Comment From: simonjayhawkins

I guess this can be summarized into

Agreed. there appears to be an issue with the boolean commutativity of a Series with bool dtype and a Series with object dtype containing None or np.nan

This issue is not really related to the contains method on the string accessor.

Indeed, str.contains accepts a na parameter so that a Series with bool dtype may be returned.

>>> import pandas as pd
>>> 
>>> str_serie = pd.Series(["hello", None, "hi"])  # Note the None
>>> true_serie = pd.Series([True] * 3)
>>> 
>>> print("1st case")
1st case
>>> print(str_serie.str.contains("h", na=False) | true_serie)  # prints True, True, True
0    True
1    True
2    True
dtype: bool
>>> 
>>> print("2nd case")
2nd case
>>> print(true_serie | str_serie.str.contains("h", na=False))  # prints True, True, True
0    True
1    True
2    True
dtype: bool
>>>