Redis Redis cluster performance scale and benchmark

I performed a scalability test of redis cluster. I run three tests as listed below. In each test, I push the redis-server process cpu to 85% and write down the total OPS (operation per seconds) the client can send.

1 redis-server instance: 350,000 OPS 3 shard cluster: 540,000 OPS 3 redis-server instances: 1,000,000 OPS

First test use a single redis-server process. The OPS is pretty impressive. I have high performance server with 40 CPU cores and 100 GB RAM. Bandwidth is also not a problem.

The second test use a 3 shards cluster (3 redis-server processes on the same server). I expect to see the OPS triples when run 3 redis-server instance. But it's far less than that. I turned of AOF and Disc write. There is no slave shards in the cluster. Could not figure out what else prevent the linear scaling. As I increase the number of shards, then throughput does not increase linearly. Please see the graph:

https://18384333892640854137.googlegroups.com/attach/10a92c23220b6/Auto%20Generated%20Inline%20Image%202?part=0.2&view=1&vt=ANaJVrHQoO_CyIwXRw-u96AMl8q-Ox8Fj7n4h2GcDPUeatU59ooTq0GcYogjD0CBp4GHW2LfmX_sbERiB8WZQfuGbzMsbn-DjLRMNrM6X_ZYp75Rxs7i6Pk

The third test is that I start 3 independent redis-server instances without joint them into a cluster. The client perform hashing on the key and dispatch command to one of the redis-server instance. The overall OPS is close to triple the single redis-server OPS. It's much better than in a cluster.

It should scale linearly according to the online documentation. That means, if you have N redis-server in a cluster, then you should get N times the capacity. Does this statement only apply to the storage and RAM. I understand there can be overhead as redis nodes talks to each other in the cluster. However, that should be very minimal if resharding does not happen.

Is my test result expect? Or is there any settings that can tune the cluster throughput. Or shall I stay away from the cluster and do my own sharding? Redis cluster provides a lot of cool feature and value that I don't want to give up.

Comment From: liuwenru

what way you do this benchmark, with you own code or redis-benchmark ?

Comment From: YuheChen

redis-benchmark does not support redis cluster. I tested with my own code which is based on hiredis asynchronous socket. The code reads the slot configuration (by command CLUSTER SLOTS) and dispatches keys to shards accordingly.

I could not find many engineers talking about redis cluster performance scalablility everywhere. That is the may reason why I posted this as a issue. I'd really appreciate if somebody who has did this test before can shed some light.

Comment From: antirez

Hello. Redis Cluster must scale linearly, because there is no logic that can prevent it, so it is basically impossible for it to behave differently: you are talking with N different clients to N different instances that are not trying to agree about values. However, you may be observing the following problem:

Redis 3.2 has a very high cost and new keys creation/deletion when Cluster is enabled. This was for a big part solved in Redis 4.0. What this means is that a Redis 3.2 in cluster mode is slower than a normal Redis 3.2 instance. 4.0 still pays some overhead, but a fraction compared to the past.

Could you please retest with latest commit in the 4.0 branch? Thanks.

Comment From: YuheChen

hi, Salvatore

Thanks a lot for your inputs. I will try 4.0 branch when I have a chance.

I did some more testing against 3.2 release using redis-benchmark utility. First, I would jump to the conclusion of my test. I also attached the numbers in case you are interested.

1 . redis-server run in cluster has very big overhead. The OPS of the simple SET commands is less than 50% compare with non-cluster mode.

N redis-server processes run on the same server does not provide xN OPS. The more process instances, the less OPS each process can handle. This might be caused by the fact that the multi-core server does not mean full parallelism. However, I believe it would scale linearly if they are placed on different servers.

About the test

Use redis-benchmark utility from to run test again each shard (redis-server process instance). The following example executes 10,000,000 SET operations using random keys. 100 simultaneous clients is created. Pipelining size set to 50.

./src/redis-benchmark -h 10.93.2.27 -p 6379 -q -n 10000000 -c 100 -P 50 -r 100000000000 set key:rand_int rand_int

Run multiple redis-benchmark process to test redis cluster. Pick hash tag to force the keys to sit in current shard which would avoid key REDIRECTION. The following sample commands are used to test 3 shards cluster.

./src/redis-benchmark -h 10.93.2.27 -p 30001 -q -n 10000000 -c 100 -P 50 -r 100000000000 set key{3}:rand_int rand_int & ./src/redis-benchmark -h 10.93.2.27 -p 30002 -q -n 10000000 -c 100 -P 50 -r 100000000000 set key{2}:rand_int rand_int & ./src/redis-benchmark -h 10.93.2.27 -p 30003 -q -n 10000000 -c 100 -P 50 -r 100000000000 set key{4}:rand_int rand_int &

OPS for different shards number when run on the same server

*** 1 shard, non-cluster mode set key{3}:rand_int rand_int: 676041.12 requests per second

*** 1 shard, cluster mode set key{4}:rand_int rand_int: 172947.55 requests per second

****3 shard cluster set key{4}:rand_int rand_int: 170598.97 requests per second set key{3}:rand_int rand_int: 151416.50 requests per second set key{2}:rand_int rand_int: 140892.70 requests per second

***6 shard cluster set key{1}:rand_int rand_int: 142492.77 requests per second set key{13}:rand_int rand_int: 135768.11 requests per second set key{3}:rand_int rand_int: 135277.73 requests per second set key{8}:rand_int rand_int: 133661.25 requests per second set key{2}:rand_int rand_int: 132471.39 requests per second set key{11}:rand_int rand_int: 119746.13 requests per second

***12 shard cluster set key{4}:rand_int rand_int: 135343.64 requests per second00 set key{1}:rand_int rand_int: 120477.58 requests per second set key{306}:rand_int rand_int: 110263.30 requests per second set key{3}:rand_int rand_int: 109621.48 requests per second set key{303}:rand_int rand_int: 109170.30 requests per second set key{2}:rand_int rand_int: 108418.71 requests per second set key{12}:rand_int rand_int: 103838.92 requests per second set key{301}:rand_int rand_int: 103506.81 requests per second set key{300}:rand_int rand_int: 103342.09 requests per second set key{103}:rand_int rand_int: 100849.16 requests per second set key{10}:rand_int rand_int: 97086.44 requests per second set key{11}:rand_int rand_int: 96357.68 requests per second

Comment From: YuheChen

Escape the under score in redis-benchmark command to avoid formatting.

./src/redis-benchmark -h 10.93.2.27 -p 6379 -q -n 10000000 -c 100 -P 50 -r 100000000000 set key:__rand_int__ __rand_int__

Comment From: YuheChen

I started my testing with cluster mode and gets about 170K OPS per shard.

When I test non-cluster mode, I got 650K SET command per seconds. It's blazing fast and I could not believe my eye. I check the dbsize and keys inside and it looks correct. The set command might be an extremely simple case. The performance difference between cluster mode and non-cluster mode may not be this big in a real world with diversified commands.

Comment From: knoguchi

@antirez 1) You said "Redis Cluster must scale linearly". Would you tell us how to run benchmark that can measure the linearity.

2) I did run redis-benchmark too. It turned out only one master (and its slave) got data. @YuheChen said "redis-benchmark does not support redis cluster", and I think so too. Can you confirm it's true? I'm using version 4.0.1

3) do you have any comments on @YuheChen 's last benchmark? When there are 12 nodes in a cluster, one node is processing approx 100K requests per second that is only 15% of non-cluster 676K requests per second. Is the result reasonable for release 3.2, or Is he missing something?

EDIT: 1 shard cluster got 172K/sec that is only 25% of non-cluster 676K/sec. It doesn't make sense. Maybe the benchmark code itself has a bug? Does it keep the TCP connection alive? The 3 shard cluster is still slower than non-cluster. 170K + 151K + 140K = 461K/sec.

Comment From: adolfoherrera1417

Does anyone have any comments as towards how to run my own benchmark for a cluster setup? Currently, have a 3 master cluster and I just want to run a benchmark for the get and set however I'm not too sure where to begin the process.

Comment From: nealYangVic

https://github.com/antirez/redis/pull/5889

Comment From: knoguchi

Thanks @nealYangVic. For people reading this thread the fix is still in the unstable branch. It wasn't included in the Redis 5.0.7 that was released on Nov 19, 2019.

Comment From: xtianus79

@adolfoherrera1417 from this website > https://geekflare.com/redis-benchmark-tools/ I have found this repo https://github.com/RedisLabs/memtier_benchmark which supports clusters. Also, I am wondering if this is the reason why when I run tests on a cluster they don't write properly AT ALL lol. There's no documentation saying it's not compatible other than knowing a -c --cluster option was not specified on the benchmark.

With that said, I rather wait for the official because it has much much less dependencies

Comment From: antirez

Hello, the cluster benchmark is also part of Redis 6 that is now in RC1 state.

On Wed, Dec 4, 2019 at 4:32 PM Kenji Noguchi notifications@github.com wrote:

Thanks @nealYangVic. For people reading this thread the fix is still in the unstable branch. It wasn't included in the Redis 5.0.7 that was released on Nov 19, 2019.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Salvatore 'antirez' Sanfilippo open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the concepts." — Fred Brooks, "The Mythical Man-Month", 1975.