Hello,

I've been messing around with the thought about making Redis Cluster completely auto-scaleable on its own and i've come to a list of missing features to make that happen.

I'll be glad to hear everyones thoughts about it and maybe update the list accordingly.

Discovery

Nodes should be able to automatically discover a living cluster on node startup based on a list of static hosts in the config (Only a small list should suffice, just for the initial connection). Example: In the config: cluster-hosts: hostname1:port1 hostname2:port2 hostname3:port3 And on startup the node will try to automatically join the cluster if its not already joined.

Auto-Rebalance

On node join/leave/fail, the cluster should automatically reallocate unallocated slots to other masters in the cluster.

Specifically: - On join, the new master should get an even share of the available slots. - On leave, the old master should rebalance its own slots evenly among all other masters. - On fail, there are two things to consider: - If the master has any slaves: the slave will take control over its slots, and a reshard of master/slave status for the cluster might happen (See next section). - Master is slave-less: The now-missing slots should get automatically reallocated evenly between the remaining masters in the cluster.

Master-Slave-Mode:

In order be fully resilient to failures when auto-scaling, for each master in the cluster there should also be a slave attached to it.

Auto-scaling multiple separate groups of servers based on stats of only one of them is extremely hard (I'm not sure existing tools can do it without any modifications), so obviously scaling up an additional slave when a new master is auto-scaled up (or down) is hard.

The current suggested solution to this is to have two distinct instances of Redis on the same server, 1M & 1S, but this has a few issues: - There could be a situation that after some changes in the cluster, a single server will run 2 masters instead of 1M & 1S, and other servers will run 2 slaves instead of 1M & 1S, which is a waste of resources. - A slave is configured as a slave of only 1 master, which means that when that master is down all its slots will be handled by a single slave, instead of splitting those slots evenly throughout the entire cluster. - Correctly configuring the state of each master/slave combination when auto-scaling is hard.

As a slave takes basically no-resources compared to a master, theoretically we can merge the two together.

We should consider making a single Redis Cluster node be able to act both as a slave and as a master, on a per slot basis. (Something like how ElasticSearch handles shards) What i mean by that is that for some of the slots it will be the master and for other slots it will be the slave, within the same instance. The cluster will be able to automatically spread slots from 1 master to multiple slaves, each slave only handling a smaller share of the same master's slots, evenly spreading the load throughout the cluster.

Summary

I believe these are the critical parts of making Redis Cluster be able to auto-scale on its own. Today its possible to achieve a part of those features by manually calling scripts to execute commands agains the cluster but its really messy & error prune.

@antirez i would love to hear your thoughts about this.

Thanks

Comment From: madolson

Since this seems like the most complete description of the problem, I'm going to keep this as the authoritative reference for the "auto-scaling" feature. Other references: https://github.com/redis/redis/issues/2460 https://github.com/redis/redis/issues/4052