here is my redis instance manual failover log
1:S 22 Oct 11:27:14.414 # Manual failover user request accepted.
1:S 22 Oct 11:27:14.509 # Received replication offset for paused master manual failover: 104949999380
1:S 22 Oct 11:27:14.534 # All master replication stream processed, manual failover can start.
1:S 22 Oct 11:27:14.534 # Start of election delayed for 0 milliseconds (rank #0, offset 104949999380).
1:S 22 Oct 11:27:14.634 # Starting a failover election for epoch 2364.
1:S 22 Oct 11:27:14.638 # Failover election won: I'm the new master.
1:S 22 Oct 11:27:14.638 # configEpoch set to 2364 after successful failover
1:M 22 Oct 11:27:14.638 # Setting secondary replication ID to 36a912df79b74f67db30bb24d7223848fbf6cf32, valid up to offset: 104949999381. New replication ID is b41553e508ad008b8d2fbfedfcf1cb18d21204e7
As we can see, between Start of election and Starting a failover election has 100ms delay. Because after Start of election, redis didn't set CLUSTER_TODO_HANDLE_FAILOVER flag on todo_before_sleep, so we have to wait for next clusterCron be called.
So, if we can set CLUSTER_TODO_HANDLE_FAILOVER flag to speed up failover which is the same as we receive quorum from masters to react fast?
Comment From: gripleaf
cool!