I have a redis cluster with only 3 masters(no slave), data is sharded into slots which are distributed among those masters. Is below interleaving possible:
- add a new slave to a single master
- before the master starts migrating data to its slave or before the data migration finishes, the master crashes.
- the crash is noticed by other remaining masters, they flagged that master as FAIL
- the (almost empty) slave starts an election and since it's the only slave, it's successfully elected
- the old master recovers and it's attached as a slave to the new master
If the following conditions are met, a slave starts an election (quoted from the cluster spec): 1. The slave's master is in FAIL state. 2. The master was serving a non-zero number of slots. 3. The slave replication link was disconnected from the master for no longer than a given amount of time, in order to ensure the promoted slave's data is reasonably fresh. This time is user configurable.
If the following conditions are met, a master grants vote (quoted from the cluster spec): 1. A master only votes a single time for a given epoch, and refuses to vote for older epochs: every master has a lastVoteEpoch field and will refuse to vote again as long as the currentEpoch in the auth request packet is not greater than the lastVoteEpoch. When a master replies positively to a vote request, the lastVoteEpoch is updated accordingly, and safely stored on disk. 2. A master votes for a slave only if the slave's master is flagged as FAIL. 3. Auth requests with a currentEpoch that is less than the master currentEpoch are ignored. Because of this the master reply will always have the same currentEpoch as the auth request. If the same slave asks again to be voted, incrementing the currentEpoch, it is guaranteed that an old delayed reply from the master can not be accepted for the new vote.
Also quoted from the cluster spec:
Masters make no effort to select the best slave in any way.
It looks like a (almost empty or totally empty) slave could be elected. Correct me if I'm wrong.
Comment From: antirez
Hi @yyklll ,
this is no longer true:
Masters make no effort to select the best slave in any way.
This is what happens: among the available slaves, the best is picked with a best effort algorithm.
However the scenario you describe is also unlikely to happen for another reason. Check the cluster-replica-validity-factor configuration option. If the replica never synchronized with the master, its "data age" should be zero, so it should not try to be elected at all.
Comment From: yyklll
Thank you. I will take a look at the option.