Today in Redis cluster, if a cluster goes does down it stops serving all traffic until the cluster health has been restored. In some types of topologies though it might make sense to continue to allow reads while slots are assigned to nodes. Cases like having two shards with replicas, when a master goes down the other 3 nodes will stop serving traffic until replica is manually promoted to be the new master. If a user is only using redis as a distributed cache, it doesn't make that much sense to block reads, or even potentially writes, on the other shards.
In this case it would be nice to be able to specify a parameter like "allows-reads-in-cluster-minority" that will allow reads to be processed in the minority cluster. Any thoughts on this?
Comment From: antirez
Hey @madolson, check this:
# cluster-require-full-coverage yes
Comment From: madolson
Full coverage is a similar problem, what I'm trying to refer to is related to this piece of code:
/* If we are in a minority partition, change the cluster state
* to FAIL. */
{
int needed_quorum = (server.cluster->size / 2) + 1;
if (reachable_masters < needed_quorum) {
new_state = CLUSTER_FAIL;
among_minority_time = mstime();
}
}
As soon as the cluster is moved to CLUSTER_FAIL, it stops serving any traffic that touches keys. My concern is only applicable to smaller clusters, 2 shards being the critical case, where a single master failure disables all nodes. Some people use these smaller clusters with cluster mode enabled since there are availability benefits to using the cluster protocol. So I was proposing some sort of flag that allows these smaller clusters to continue serving read traffic function if a master dies.
The alternative is to just tell these users to add a third shard.
Comment From: antirez
Oh ok now I understand the scenario. Here the things are two, we could cure the symptom, that is, prevent the cluster from going down in this scenario via a given option, or we could cure the cause. About that last option, the ides was to add a new feature to Redis Cluster: "caching mode". In this mode there are no replicas (but there is to think about this, for now just let's ignore replicas), and only masters. The key quality however is that there is no failover, once a slot cannot be served because the instance it belongs is down, it gets just assigned to another random node, via a simple agreement protocol, and so forth. However when instances are available again and are empty, we slowly move slots back to such instances so that they take part again to the cluster. Here the important thing is that keys are not moved: when an hash slot is moved from one place to another, it will simply start empty, cause this mode is only useful ad a distributed cache.
This mode could be improved by adding also the concept of replicas in order to avoid losing parts of the key space when possible: in such mixed mode the slot reassignment to other masters would be done only when there is no viable failover for the slave. Such cluster would be always available, at different costs: if possible at the cost of a failover. If not possible at the cost of losing part of the key space and serving the hash slot via a different master.
In that second incarnation it is no longer true that there is no failover, there are just two types of failover. The usual failover if possible, otherwise an hard slot failover that loses data.
Comment From: madolson
That second option is intriguing, but not trivial to implement. I also think having the first option solves part of the problem, but it does throw consistency out the window if there is a legitimate network partition, which may not be ideal for other use cases.
That's why I was thinking about having an option to just serve reads in the cluster_fail state. ElasticSearch does something similar with their no_master_block parameter: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html
It at least keeps read availability while any recovery operation is taking place, but will not violate any data consistency within the rest of the cluster. "allows-reads-while-cluster-is-down" is maybe a more clear name for the option I am proposing.
Comment From: antirez
yes @madolson, the second option is, while not a huge amount of work, clearly non trivial, especially because for Redis 6 there is already allocated a long list of cluster changes, so that would be postponed even more. Instead what you proposed could be a great "ad interim" solution while we try to figure out a better fix, and should be very simple to implement.
Comment From: madolson
This was fixed by the PR