Code Sample
import pandas as pd
pd.eval('exp(2)')
Problem description
Result is 7.0
. Result should be 7.3890560989306504
.
Note: Using float value instead of int (e.g. pd.eval('exp(2.0)'
) evaluates correctly.
Expected Output
7.3890560989306504
Output of pd.show_versions()
Comment From: jlwaldeck
cc: @vmuriart
Comment From: WillAyd
I ran this on v0.22 and master and both were fine. Can you try upgrading to see if it resolves the issue?
Comment From: jschendel
I can reproduce this on both master and 0.22.0.
In [2]: pd.__version__
Out[2]: '0.23.0.dev0+815.gf799916'
In [3]: pd.eval('exp(2)')
Out[3]: 7.0
In [4]: pd.eval('exp(2.0)')
Out[4]: 7.3890560989306504
In [5]: pd.eval('sqrt(6)')
Out[5]: 2.0
In [6]: pd.eval('sqrt(6.0)')
Out[6]: 2.4494897427831779
In [7]: pd.eval('log(6)')
Out[7]: 1.0
In [8]: pd.eval('log(6.0)')
Out[8]: 1.791759469228055
Seems to be limited to engine='numexpr'
, so it might not be reproducible without numexpr
:
In [9]: pd.eval('exp(2)', parser='pandas', engine='numexpr')
Out[9]: 7.0
In [10]: pd.eval('exp(2)', parser='python', engine='numexpr')
Out[10]: 7.0
In [11]: pd.eval('exp(2)', parser='pandas', engine='python')
Out[11]: 7.3890560989306504
In [12]: pd.eval('exp(2)', parser='python', engine='python')
Out[12]: 7.3890560989306504
Directly using numexpr
seems fine though:
In [16]: ne.evaluate('exp(2)')
Out[16]: array(7.38905609893065)
Comment From: vmuriart
The call in _evaluate returns the correct answer as well.
The issue shows up after the call into _reconstruct_object(...)
from within the evaluate
function. The python engine doesn't have this issue since it overrides evaluate
on its definition.
It looks like the float is being forced back into an integer on that call.
Comment From: zmeves
This issue is still relevant in version 1.5.0
. pandas.eval
should be able to return equivalent results to numexpr.evaluate
when given arrays to operate on, but it doesn't. Take this usage of log10
for example:
# Setup
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]}).astype({'a': float, 'b': int})
>>> df
a b
0 1.0 1
1 2.0 2
2 3.0 3
When passing in variables as Series objects, eval() works and returns the same values as numexpr.evaluate
>>> pd.eval('log10(abs(a))', local_dict={'a': a, 'b': b}, engine='numexpr')
0 0.000000
1 0.301030
2 0.477121
dtype: float64
>>> ne.evaluate('log10(abs(a))', local_dict={'a': a, 'b': b})
array([0. , 0.30103 , 0.47712125])
>>> pd.eval('log10(abs(b))', local_dict={'a': a, 'b': b}, engine='numexpr')
0 0.000000
1 0.301030
2 0.477121
dtype: float64
>>> ne.evaluate('log10(abs(b))', local_dict={'a': a, 'b': b})
array([0. , 0.30103 , 0.47712125])
When passing in objects as numpy arrays, pandas.eval forces the result back into an integer when operating on b. numexpr doesn't do this:
>>> pd.eval('log10(abs(a))', local_dict={'a': a.to_numpy(), 'b': b.to_numpy()}, engine='numexpr')
array([0. , 0.30103 , 0.47712125]) # Matches numexpr
>>> ne.evaluate('log10(abs(a))', local_dict={'a': a.to_numpy(), 'b': b.to_numpy()})
array([0. , 0.30103 , 0.47712125])
>>> pd.eval('log10(abs(b))', local_dict={'a': a.to_numpy(), 'b': b.to_numpy()}, engine='numexpr')
array([0., 0., 0.]) # Erroneously cast to integer
>>> ne.evaluate('log10(abs(b))', local_dict={'a': a.to_numpy(), 'b': b.to_numpy()})
array([0. , 0.30103 , 0.47712125])