We are running Redis version 3.2 in 3 servers and 3 sentinel configuration. And during deployment, we encountered this problem that sentinels are not able to failover when the master is determined down.

1:X 10 Nov 03:52:40.754 # +sdown master master123 10.0.15.10 6379
1:X 10 Nov 03:52:40.810 # +odown master master123 10.0.15.10 6379 #quorum 2/2
1:X 10 Nov 03:52:40.811 # +new-epoch 1303
1:X 10 Nov 03:52:40.811 # +try-failover master master123 10.0.15.10 6379
1:X 10 Nov 03:52:40.814 # +vote-for-leader 065c8fe4878a3984ab4c7964370e7d94807c62e7 1303
1:X 10 Nov 03:52:40.820 # 6740f56579e079bdb8012286fa6635a3a161b787 voted for 065c8fe4878a3984ab4c7964370e7d94807c62e7 1303
1:X 10 Nov 03:52:40.824 # 19d2c1636e481d14a29c055f5cd8c0baf64363e4 voted for 065c8fe4878a3984ab4c7964370e7d94807c62e7 1303
1:X 10 Nov 03:52:44.699 # -failover-abort-not-elected master master123 10.0.15.10 6379
1:X 10 Nov 03:52:44.782 # Next failover delay: I will not start a failover before Wed Nov 10 03:52:47 2021

Comment From: hwware

It means the process of the failover is aborted because the new leader is not elected successfully. From the log, you could find the next failover will start after Wed Nov 10 03:52:47 2021.

You could check the source code in Redis 3.2 to find the comment: / Abort the failover if I'm not the leader after some time. /

Comment From: benimohit

Thanks. I will have a look at the source code. From the logs its evident that all 3 sentinels selected 065c8fe4878a3984ab4c7964370e7d94807c62e7 as the leader then why would it fail ? Also, I was not clear whether it failed to select the leader or leader selection was good but that leader failed to do the actual failover.

Comment From: androidkh

same for me: I've deployed 2 redis and 6 sentinels on 2 hosts: 3 sentinels on each host. When master host goes down - other 3 sentinels cannot elect new master. I've tried quorum 1 and 2. no luck First, I've started with 2 nodes: 1 redis + 1 sentinel. it wasn't able to elect new master, I've read it could be because of no quorum. As 1 sentinel cannot work by its own?! added the second - they failed to elect who is more master from them because both rely on epoch time (I can be wrong) while running on the same host. added 3 sentinels on the same host - no luck, error is still the same:

1:X 04 Dec 2022 05:07:55.455 # +new-epoch 4 1:X 04 Dec 2022 05:07:55.455 # +try-failover master mymaster 172.31.48.79 6379 1:X 04 Dec 2022 05:07:55.459 * Sentinel new configuration saved on disk 1:X 04 Dec 2022 05:07:55.459 # +vote-for-leader 69d22e0bdbf9ab09125b264cff04229797566167 4 1:X 04 Dec 2022 05:07:55.467 # abcb8b1b5049487907541799826808f53a279414 voted for 69d22e0bdbf9ab09125b264cff04229797566167 4 1:X 04 Dec 2022 05:07:55.467 # 5e794351bee058d087c3787d97e8f9d63c855d94 voted for 69d22e0bdbf9ab09125b264cff04229797566167 4 1:X 04 Dec 2022 05:08:05.953 # -failover-abort-not-elected master mymaster 172.31.48.79 6379 1:X 04 Dec 2022 05:08:06.036 # Next failover delay: I will not start a failover before Sun Dec 4 05:08:55 2022

Failover works perfectly fine when I kill redis master only leaving sentinels running on that host. so my issue happens only when host with redis and sentinels goes down

Comment From: zerogtw

I want to know what new progress is here?

Comment From: hwware

The redis 3 is too old. Can you try on latest version and check if it happens again? Thanks

Comment From: Funkydream

The redis 3 is too old. Can you try on latest version and check if it happens again? Thanks

Hi, @hwware

I got the same problem with Redis version 6.0.9 in 2 servers(1 master + 1 slave) and 5 sentinel configuration. This sentinel group managed at least 100 redis masters. It happened when there was a lot of masters down.

log for 7c99908273ed894d926d86bf4fe998378e1a288d 29026:X 04 Dec 2023 13:21:25.883 # +odown master cluster_282 10.15.145.1 20564 #quorum 4/3 29026:X 04 Dec 2023 13:21:25.883 # +new-epoch 45 29026:X 04 Dec 2023 13:21:25.883 # +try-failover master cluster_282 10.15.145.1 20564 29026:X 04 Dec 2023 13:21:25.891 # +vote-for-leader 7c99908273ed894d926d86bf4fe998378e1a288d 45 29026:X 04 Dec 2023 13:21:25.990 # e83af7e16cb690c95d62ede6abe6515ecb5113de voted for 7c99908273ed894d926d86bf4fe998378e1a288d 45 29026:X 04 Dec 2023 13:21:26.265 # fdf9c299ac37b84d15c7be55d4d983c839905002 voted for 7c99908273ed894d926d86bf4fe998378e1a288d 45 29026:X 04 Dec 2023 13:21:37.594 # -failover-abort-not-elected master cluster_282 10.15.145.1 20564 29026:X 04 Dec 2023 13:21:37.594 # Next failover delay: I will not start a failover before Mon Dec 4 13:27:27 2023

This situation did not last long, and the next failover was executed successfully.