A small, complete example of the issue
# Your code here
df = pd.DataFrame(data=[(1,2),(3,4)], columns=['a', 'b'])
pd.eval("sum(df) / 7", local_dict={'sum': np.sum, 'df': df}, engine='numexpr')
*** TypeError: unhashable type: 'numpy.ndarray'
Expected Output
a 0.571429 b 0.857143 dtype: float64
This works fine when doing that same sum in python (and it also works fine in pd.eval() using the 'python' engine.
The following from the docs does not seem to apply in this case.
Series and DataFrame objects are supported and behave as they would with plain ol’ Python evaluation .
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.5-300.fc23.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: None.None
pandas: 0.19.0 nose: None pip: 8.1.2 setuptools: 27.2.0 Cython: None numpy: 1.11.2 scipy: 0.18.1 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.7 blosc: None bottleneck: None tables: None numexpr: 2.6.1 matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: 0.8.3 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.42.0 pandas_datareader: None
Comment From: jreback
what you are trying to do is simply unsupported. numexpr
provides a fast evaluation engine, but is limited in the semantics.
In [6]: df = pd.DataFrame(data=[(1,2),(3,4)], columns=['a', 'b'])
...: pd.eval("sum(df) / 7")
ValueError: "sum" is not a supported function
is a quite reasonable error message. You swapping out a function definition which is not supported.
Comment From: rmancy-smg
Ahh, the above code that you pasted is not the original that I put in the Issue.
Notice that sum()
in the eval is actually the np.sum
because it was added to the 'local_dict' of the pd.eval()
function. In your snippet above you are using the python sum()
which is indeed unsupported. That's a different error to the one my snippet is generating though.
The crux of the original error as I understand it is that numexpr
(?) is trying to call frozenset()
on an ndarray
which is indeed unhashable.
Comment From: jreback
@rmancy-smg I did that on purpose. you are trying to substitue for a named function, which is not supported. This kind of evaluation is very specific in what it does; your usage is very odd .
Comment From: rmancy-smg
Hmm, perhaps I'm not picking up what you're putting down, but even if you do the following:
pd.eval("df.sum() / 7", engine='numexpr', local_dict={'df': df})
It gives the same error. There is no substitution of a named function in this example.
Comment From: jreback
The pandas semantics are geared towards columnwise operations, so the following is valid
In [5]: df = pd.DataFrame(data=[(1,2),(3,4)], columns=['a', 'b'])
...: pd.eval("a.sum() / 7", engine='numexpr', local_dict={'a':df.a})
Out[5]: 0.5714285714285714
passing a 2-d object directly is not supported, not really sure why you would want to go down this path. df.sum() / 7
is much more intuitive / expected.
Comment From: rmancy-smg
Mostly because I'm trying to use pd.eval()
for evaluating user defined functions (i.e sum(x) / 7
) over pandas Dataframes. Specifically ,I needed it for the whole Dataframe as it was constructed such that each column represented user's timeseries data (and the intent was to apply the function over all user's data).
Anyway, it's clear that it will not work like this, and i've since found a workaround. The only minor gripe I have is that the documentation states quite unequivocally that it should.
Thanks again for your insights @jreback
Comment From: jorisvandenbossche
@rmancy-smg Passing dataframes works fine, eg:
In [15]: pd.eval("df / 7", local_dict={'df': df}, engine='numexpr')
Out[15]:
a b
0 0.142857 0.285714
1 0.428571 0.571429
It was the function call 'df.sum()'
or sum(df)
that is not supported.
Can you point to the docs where you see this incorrectly stated? (unless you meant the dataframe-part) Here: http://pandas.pydata.org/pandas-docs/stable/enhancingperf.html#supported-syntax is a detailed overview of the supported syntax, and functions calls are listed in the 'not' section.
When trying to evaluate auser defined function, you can simply do sum(df)
instead of pd.eval('sum(df)')
.
Comment From: rmancy-smg
@jorisvandenbossche
Hmm, yeah the original quote from here[1] says this:
Series and DataFrame objects are supported and behave as they would with plain ol’ Python evaluation.
Which I just took to mean that using Series and Dataframe objects would work just as they do in "plain ol' Python", but maybe there are some semantics in there that I'm not interpreting correctly?
1 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.eval.html
Comment From: jorisvandenbossche
@rmancy-smg What is meant is that the supported operations work with series/dataframes as they would do in python. So that "df / 7"
does what you expect.
It does not mean that every operation on dataframes will work, since not all syntax is supported, eg function calls are not supported.
If you want to clarify the wording in the docstring, feel free to open a PR!