Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

np_df = np.random.randn(10000, 4000)
df = pd.DataFrame(np_df)

%timeit np.round(np_df, 2)
# 416 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.round(2)
# 1.69 s ± 27.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit np.round(df, 2)
# 1.74 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Problem description

Completely unexpected DataFrage.round() showed up as a major hotspot during profiling. When looking at the code, we see that even when rounding the complete data frame to a given number of decimals it is split into series objects which are then rounded.

I am wondering if there is a reason not to pass the underlying data frame to numpy and do the rounding there in this case.

A quick test showed that something like this would give us the numpy performance: ```python def faster_round(df, decimals): rounded = np.round(df.values, decimals) return pd.DataFrame(rounded, columns=df.columns, index=df.index)

%timeit faster_round(df, 2)

417 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Comment From: gfyoung

@aberres : Good question! Ultimately, we doing up touch the get_values method for many pandas objects, which returns ndarray. Perhaps we could re-implement to avoid all of this indirection, though be careful to ensure nothing breaks.

@jreback @jorisvandenbossche

Comment From: jreback

this is done column by column, it could instead be per-dtype as a block. would require some amount of work to do this. pull-requests are welcome.

Pandas PERF: DataFrame.round() unnecessarily slow copared to np.round()

Code Sample, a copy-pastable example if possible

Problem description

417 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)