In [11]: df = pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), "C": np.random.randn(1000000)})
In [13]: %timeit pd.eval('df*df')
1 loops, best of 3: 635 ms per loop
In [14]: df = df.astype(float)
In [15]: %timeit pd.eval('df*df')
100 loops, best of 3: 5.87 ms per loop
the time is all in _interleave
as we need to pass values to numexpr
Comment From: cpcloud
How could this be fixed? I don't see how we can get around a copy being made to unify the type to pass to numexpr
.
Comment From: jreback
not sure maybe if u detect only numeric then could just either use python or astype first in this case the expression is quite simple
Comment From: mroeschke
Looks like the perf is pretty close now so closing
In [6]: In [11]: df = pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), "C": np.random.randn(1000000)})
In [7]: %timeit pd.eval('df*df')
39.1 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [8]: df = df.astype(float)
In [9]: %timeit pd.eval('df*df')
34.9 ms ± 259 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)