With the introduction of some of v7.0 features we've measured a max overhead of 7% drop in the achievable ops/sec on a simple standalone deployment and a GET benchmark with 1KiB values.
I've used fae5b1a19d0972c2f4274004f15be3d2f90c856c as the "unstable" reference.
This is more evident on 10-15 pipeline results, as outline on the following chart and table (given we're reducing the syscalls and RTT time overhead) :
| pipeline on GET | 5.0.13 | 6.2.6 | unstable | % change unstable vs 6.2 | % change unstable vs 5.0 |
|---|---|---|---|---|---|
| 1 | 147925 | 145662 | 141062 | ---% | ---% |
| 5 | 333895 | 329112 | 316108 | ---% | -5.3% |
| 10 | 445349 | 436320 | 414768 | -5% | -6.9% |
| 15 | 502064 | 491278 | 467111 | -5% | -7.0% |
| 20 | 505349 | 493715 | 487269 | -1.3% | -3.6% |
| 25 | 517902 | 506944 | 502212 | -0.9% | -3.0% |
| 30 | 534072 | 524354 | 510230 | -2.7% | -4.5% |
We can pinpoint around 5-6% of the CPU cycles to the following functions and inner code:
| Funtion | %CPU time | Note |
|---|---|---|
| updateCachedTime | 1.80% | #9194 |
| updateClientMemUsage | 1.50% | (after the improvement of https://github.com/redis/redis/pull/10401 ) |
| ACLCheckAllUserCommandPerm | 1.20% | #9974 |
| updateCommandLatencyHistogram | 0.80% | #9462 |
Associated with each function I've added the last PR that touched the code/introduced the feature. I would suggest we analyze further each of the features to try to reduce/squeeze as much performance as possible. IMHO the features introduced are quite valid/requirement so we need to try to reduce this overhead as much as possible.
Comment From: oranagra
@filipecosta90 how did you conclude the regression in updateCachedTime is from #9194? AFAICT it didn't change anything in that respect.
Comment From: filipecosta90
@filipecosta90 how did you conclude the regression in updateCachedTime is from #9194? AFAICT it didn't change anything in that respect.
you're absolutely right @oranagra. I just pointed to it via git blame. So let's see what's the real PR doing the introduction.
Comment From: filipecosta90
Reminder that after https://github.com/redis/redis/pull/10502 we need to revive this data for unstable. Will produce a new chart and reply further to this issue.
Comment From: filipecosta90
Updated numbers using the current unstable code from Wed Apr 20 ( 3cd8baf61610416aab45e0bcedcaab9beae80184 ). With the work of the last month ( between march 20 and april 20 ) the regression was reduced ~5% vs v.5 and it's now at a around 2-3% at worst case vs v.6.2. Furthermore, at high pipeline numbers v7.0 ( unstable ) outperforms v6.2 and it's equal to v5 even with the newly added logic.
To reproduce:
run redis
taskset -c 0 `pwd`/src/redis-server --logfile redis-opt.log --save "" --daemonize yes
run the following script for each of the these redis versions
D=60
DATASIZE=1000
P=performance
rm results.csv
CORES="1,2"
# populate
taskset -c $CORES memtier_benchmark -d $DATASIZE --ratio 1:0 --key-pattern=P:P -t 2 --hide-histogram --key-maximum=1000000 --key-minimum 1
# benchmark
for pipeline in 1 5 10 15 20 25 30; do
taskset -c $CORES memtier_benchmark -d $DATASIZE --ratio 0:1 --test-time $D --pipeline $pipeline --key-pattern=P:P -t 2 -o $pipeline.txt --hide-histogram --key-maximum=1000000 --key-minimum 1
cat $pipeline.txt | grep Totals | awk -v r=$pipeline '{print r " , " $2}' >>results.csv
done
At the end of each run check the results.csv file.
Comment From: oranagra
I think we may have finished dealing with regression, but there are still some ideas for improvement, two mentioned in https://github.com/redis/redis/pull/10697#issuecomment-1137334208 which we should find time to evaluate.