Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
df = pd.DataFrame(
[[1, 1], [2, 2], [3, 3]],
columns=["pd_dtype", "np_dtype"],
).astype({"pd_dtype": pd.core.arrays.integer.Int64Dtype(), "np_dtype": int})
ref = {2,3}
print(pd.__version__, np.__version__)
print(df.query("np_dtype in @ref"))
print(df.query("pd_dtype in @ref"))
Issue Description
Based on debugging this appears to be the pathway of execution
column in @env_variable
-> pd.Series
containing true/false ->necompiler--> np.array
containing true/false -> df.loc[nparray]
.
It appears that at some point in the execution of the query the Int64Dtype gets manipulated into a pandas.core.arrays.boolean.BooleanDtype. Meanwhile, the good example gives a pd.Series with dtype numpy.dtype[bool_] which numpy is perfectly happy with.
Numpy doesn't know how to handle the BooleanDType and np.asarray gives a np.array with dtype 'O' and the type check throws an exception:
.../python3.10/site-packages/numexpr/necompiler.py:692, in getType(a)
690 if kind == 'U':
691 raise ValueError('NumExpr 2 does not support Unicode as a dtype.')
--> 692 raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type object
Looking back and ran the "bad" scenario code in a previous pandas, and the intermediate pd.Series was of the normal np bool dtype. I don't think any of the relevant code for query/eval/numexpr has changed in the past ~4+ ish years according to git blame, so it must be that something in pandas 1.5 is more aggressively turning things to BooleanDType
Expected Behavior
I would exepct both Int64Dtype and int to work in a query and both to return
pd_dtype np_dtype
1 2 2
2 3 3