A small, complete example of the issue

# Your code here
df = pd.DataFrame(data=[(1,2),(3,4)], columns=['a', 'b'])
pd.eval("sum(df) / 7", local_dict={'sum': np.sum, 'df': df}, engine='numexpr')
*** TypeError: unhashable type: 'numpy.ndarray'

Expected Output

a 0.571429 b 0.857143 dtype: float64

This works fine when doing that same sum in python (and it also works fine in pd.eval() using the 'python' engine.

The following from the docs does not seem to apply in this case.

Series and DataFrame objects are supported and behave as they would with plain ol’ Python evaluation .

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.5-300.fc23.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: None.None

pandas: 0.19.0 nose: None pip: 8.1.2 setuptools: 27.2.0 Cython: None numpy: 1.11.2 scipy: 0.18.1 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.7 blosc: None bottleneck: None tables: None numexpr: 2.6.1 matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: 0.8.3 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.42.0 pandas_datareader: None

Comment From: jreback

what you are trying to do is simply unsupported. numexpr provides a fast evaluation engine, but is limited in the semantics.

In [6]: df = pd.DataFrame(data=[(1,2),(3,4)], columns=['a', 'b'])
   ...: pd.eval("sum(df) / 7")
ValueError: "sum" is not a supported function

is a quite reasonable error message. You swapping out a function definition which is not supported.

Comment From: rmancy-smg

Ahh, the above code that you pasted is not the original that I put in the Issue.

Notice that sum() in the eval is actually the np.sum because it was added to the 'local_dict' of the pd.eval() function. In your snippet above you are using the python sum() which is indeed unsupported. That's a different error to the one my snippet is generating though.

The crux of the original error as I understand it is that numexpr (?) is trying to call frozenset() on an ndarray which is indeed unhashable.

Comment From: jreback

@rmancy-smg I did that on purpose. you are trying to substitue for a named function, which is not supported. This kind of evaluation is very specific in what it does; your usage is very odd .

Comment From: rmancy-smg

Hmm, perhaps I'm not picking up what you're putting down, but even if you do the following:

pd.eval("df.sum() / 7", engine='numexpr', local_dict={'df': df})

It gives the same error. There is no substitution of a named function in this example.

Comment From: jreback

The pandas semantics are geared towards columnwise operations, so the following is valid

In [5]: df = pd.DataFrame(data=[(1,2),(3,4)], columns=['a', 'b'])
   ...: pd.eval("a.sum() / 7", engine='numexpr', local_dict={'a':df.a})
Out[5]: 0.5714285714285714

passing a 2-d object directly is not supported, not really sure why you would want to go down this path. df.sum() / 7 is much more intuitive / expected.

Comment From: rmancy-smg

Mostly because I'm trying to use pd.eval() for evaluating user defined functions (i.e sum(x) / 7) over pandas Dataframes. Specifically ,I needed it for the whole Dataframe as it was constructed such that each column represented user's timeseries data (and the intent was to apply the function over all user's data).

Anyway, it's clear that it will not work like this, and i've since found a workaround. The only minor gripe I have is that the documentation states quite unequivocally that it should.

Thanks again for your insights @jreback

Comment From: jorisvandenbossche

@rmancy-smg Passing dataframes works fine, eg:

In [15]: pd.eval("df / 7", local_dict={'df': df}, engine='numexpr')
Out[15]: 
          a         b
0  0.142857  0.285714
1  0.428571  0.571429

It was the function call 'df.sum()' or sum(df) that is not supported.

Can you point to the docs where you see this incorrectly stated? (unless you meant the dataframe-part) Here: http://pandas.pydata.org/pandas-docs/stable/enhancingperf.html#supported-syntax is a detailed overview of the supported syntax, and functions calls are listed in the 'not' section.

When trying to evaluate auser defined function, you can simply do sum(df) instead of pd.eval('sum(df)').

Comment From: rmancy-smg

@jorisvandenbossche

Hmm, yeah the original quote from here[1] says this:

Series and DataFrame objects are supported and behave as they would with plain ol’ Python evaluation.

Which I just took to mean that using Series and Dataframe objects would work just as they do in "plain ol' Python", but maybe there are some semantics in there that I'm not interpreting correctly?

1 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.eval.html

Comment From: jorisvandenbossche

@rmancy-smg What is meant is that the supported operations work with series/dataframes as they would do in python. So that "df / 7" does what you expect. It does not mean that every operation on dataframes will work, since not all syntax is supported, eg function calls are not supported. If you want to clarify the wording in the docstring, feel free to open a PR!