Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
# On trunk
pd.Series([1], dtype="bool") // np.bool_(0) # => NotImplementedError
pd.Series([1], dtype="bool") // pd.Series([0], dtype="bool") # => NotImplementedError
pd.Series([1, 0], dtype="bool") // pd.Series([0], dtype="bool") # => ZeroDivisionError
pd.Series([1, 0], dtype="bool") // pd.Series([1], dtype="bool") # => pd.Series([1, np.nan], dtype="object")
pd.Series([1, 0], dtype="bool") // pd.Series([0, 0], dtype="bool") # => NotImplementedError
pd.Series([1, 0], dtype="bool") // pd.Series([1, 1], dtype="bool") # => NotImplementedError
Issue Description
If only one operand is a Series
, we consistently get a NotImplementedError
for pow
, truediv
, and floordiv
if both operands have boolean dtypes.
If both operands are series (with bool dtype) the behaviour is dependent on whether or not the series have matching lengths. When the lengths mismatch, for some reason the NotImplementedError
check is bypassed and we just evaluate as normal.
Expected Behavior
If one is going to not allow division or exponentiation of boolean dtypes, it should happen consistently.
Aside: FWIW, I don't really like this choice, why are booleans special in this way? (I see some discussion in https://github.com/pandas-dev/pandas/issues/41165 but with no conclusion).
Installed Versions
Comment From: mroeschke
Almost similar behaviors happen for Index
cc @jbrockmendel
pd.Index([1], dtype="bool") // np.bool_(0) # => NotImplementedError
pd.Index([1], dtype="bool") // pd.Index([0], dtype="bool") # => NotImplementedError
pd.Index([1, 0], dtype="bool") // pd.Index([0], dtype="bool") # => NotImplementedError *different from Series
pd.Index([1, 0], dtype="bool") // pd.Index([1], dtype="bool") # => NotImplementedError *different from Series
pd.Index([1, 0], dtype="bool") // pd.Index([0, 0], dtype="bool") # => NotImplementedError
pd.Index([1, 0], dtype="bool") // pd.Index([1, 1], dtype="bool") # => NotImplementedError
Agreed consistency is an issue here, but discussion whether this should be implemented should probably continue on https://github.com/pandas-dev/pandas/issues/41165
Comment From: jbrockmendel
IIUC the differences vs Series probably occur bc the Series has to reindex with mismatched lengths, so ends up with object dtype.
Comment From: wence-
IIUC the differences vs Series probably occur bc the Series has to reindex with mismatched lengths, so ends up with object dtype.
That seems likely. Since with
l = pd.Series([1, 0], dtype="bool")
r = pd.Series([1], dtype="bool")
r.reindex_like(l)
Produces an object-dtype series with True, NaN
, I suppose because there is no nullable bool
series that one can put a pd.NA in.
Comment From: mroeschke
Produces an object-dtype series with True, NaN, I suppose because there is no nullable bool series that one can put a pd.NA in.
There's a nullable bool dtype with the string "boolean"
>>> l = pd.Series([1, 0], dtype="boolean")
>>> r = pd.Series([1], dtype="boolean")
>>> r.reindex_like(l)
0 True
1 <NA>
dtype: boolean