Redis [Cluster] Auto rebalance - Nineya|java/go/python

Copied from https://github.com/antirez/redis/issues/3009

On node join/leave/fail, the cluster should automatically reallocate unallocated slots to other masters in the cluster.

Specifically:

On join, the new master should get an even share of the available slots.
On leave, the old master should rebalance its own slots evenly among all other masters before actually leaving the cluster.
On fail, there are two things to consider:
If the master has any slaves: the slave will take control over its slots, and a reshard of master/slave status for the cluster might happen (See next section).
Master is slave-less: The now-missing slots should get automatically reallocated evenly between the remaining masters in the cluster.

Comment From: jcstanaway

On leave, the old master should rebalance its own slots evenly among all other masters before actually leaving the cluster.

This seems to imply a controlled exit. What about unplanned scenarios? The old master won't be able to initiate a rebalance. A scenario of concern is that the server failed and the master is gone. In the event of a slave-less master, the remaining masters should - after a configurable time period - trigger a rebalance. The time period is important as depending on the deployment environment (e.g., Kubernetes), the master could recover quickly enough where a rebalance shouldn't be performed (and "quickly" is subjective, hence configurable).

Comment From: BarthV

+1 for this feature ! About @ccs018 post, leave operation should be announced by the leaving node itself and specifically handled by the rest of the cluster (like nodetool decommission CLI command for Cassandra ...).

Another vision for this topic can be to simply refuse to implement this feature and only allow the cluster manager to handle the slot rebalancing & data reshuffling. Currently the cluster manager (redis-trib.rb or any third party cluster manager) is a "one shot" CLI command but we can imagine that, in the future, it will be a long run stateless application that would expose a REST API to handle operations. This is more Kubernetes-compliant vision, as this cluster manager could be integrated in a CRD (a.k.a Operator).

Comment From: shaharmor

I think this is something that can now be implemented using Redis Modules, with the new timers & cluster module support

Comment From: madolson

Closing as duplicate in favor of https://github.com/redis/redis/issues/3009.