I have a very simple setup, one master and one slave. On both instances I have also a sentinel, with the same configuration file on both:

sentinel monitor mymaster 10.99.13.107 6379 1 sentinel down-after-milliseconds mymaster 10000 sentinel failover-timeout mymaster 10000 loglevel verbose

When I kill the master instance, the failover procedure kicks in correctly and the slave gets promoted to master. However, when I kill both the master and the sentinel on the same instance (I want to simulate what happens when an instance crashes or goes down completely) then the failover procedure does not happen. The sentinel that lives on the slave instance keeps trying to elect the original master. The log of that sentinel is this:

[28069] 17 Jun 22:57:20.302 # Sentinel runid is 7d08ab54ddce7931c745459996aa0cf1e33f98c1 [28069] 17 Jun 22:57:20.302 # +monitor master mymaster 10.99.13.107 6379 quorum 1 [28069] 17 Jun 22:57:20.900 * +sentinel sentinel 10.99.13.107:26379 10.99.13.107 26379 @ mymaster 10.99.13.107 6379 [28069] 17 Jun 22:57:20.914 # +new-epoch 283 [28069] 17 Jun 22:57:22.395 - Accepted 10.99.13.107:49615 [28069] 17 Jun 22:57:40.395 * +slave slave 10.194.250.140:6379 10.194.250.140 6379 @ mymaster 10.99.13.107 6379 [28069] 17 Jun 22:58:50.896 - Client closed connection [28069] 17 Jun 22:59:00.940 # +sdown sentinel 10.99.13.107:26379 10.99.13.107 26379 @ mymaster 10.99.13.107 6379 [28069] 17 Jun 22:59:01.196 # +sdown master mymaster 10.99.13.107 6379 [28069] 17 Jun 22:59:01.196 # +odown master mymaster 10.99.13.107 6379 #quorum 1/1 [28069] 17 Jun 22:59:01.196 # +new-epoch 284 [28069] 17 Jun 22:59:01.196 # +try-failover master mymaster 10.99.13.107 6379 [28069] 17 Jun 22:59:01.203 # +vote-for-leader 7d08ab54ddce7931c745459996aa0cf1e33f98c1 284 [28069] 17 Jun 22:59:11.538 # -failover-abort-not-elected master mymaster 10.99.13.107 6379 [28069] 17 Jun 22:59:11.628 # Next failover delay: I will not start a failover before Tue Jun 17 22:59:21 2014 [28069] 17 Jun 22:59:21.475 # +new-epoch 285 [28069] 17 Jun 22:59:21.475 # +try-failover master mymaster 10.99.13.107 6379 [28069] 17 Jun 22:59:21.482 # +vote-for-leader 7d08ab54ddce7931c745459996aa0cf1e33f98c1 285 [28069] 17 Jun 22:59:32.449 # -failover-abort-not-elected master mymaster 10.99.13.107 6379 [28069] 17 Jun 22:59:32.525 # Next failover delay: I will not start a failover before Tue Jun 17 22:59:42 2014

etc. etc. it keeps retrying. As you can see from this log it knows about the slave (+slave). It also knows about the master going down ("+sdown master mymaster" and "+odown master mymaster"). So why does it keep doing "+try-failover master mymaster 10.99.13.107 6379" and elects the master that it knows is down?

redis-server --version: Redis server v=2.8.10 sha=00000000:0 malloc=tcmalloc-2.0 bits=64 build=176d015270bbec54

Comment From: icyice80

check the redis sentinel doc, based on your master/slave config, u need to have 3 sentinels min, quroum is 2. so when 1 sentinel goes down, the other 2 could elect leader, then kick in the failover process.

Comment From: antirez

Or... a single one if you don't care about Sentinel being a single point of failure (discouraged approach). However note that if you go for three, you need to setup Sentinel in three different computers (or virtual machines) that are likely to fail independently, otherwise you have a setup that is only valid under the assumption of single processes failing (like Redis server crashing) but not working on netsplits, since two or more Sentinels will run into the same physical host (so will always get partitioned together).

Comment From: jeuniii

@DutchMark So did you figure out what the issue was ? Im having the exact same issue as you. My quorum is set to 1 since Im just testing it out. Eventually ill have 3 separate nodes with quorum set to 1.

Comment From: feigyfroilich

Why is it closed? I am facing the same issue. @DutchMark Have you find a solution ?