Redis v3.2.9 cluster with node 1, 2, 3, sentinel enabled.
service network stop on master node1 to simulate network failure. It sometimes takes 16 minutes for node2 and node3 to start failover.
Logs on node2
16621:X 18 Apr 10:31:39.286 # +sdown sentinel df750df382c4ed320a8b8e403e2d26244a93c103 10.84.23.21 26379 @ redis-cluster 192.168.0.21 6379
# hangs up here for 16 minutes
16621:X 18 Apr 10:47:51.562 # +sdown master redis-cluster 192.168.0.21 6379
16621:X 18 Apr 10:47:51.652 # +odown master redis-cluster 192.168.0.21 6379 #quorum 2/2
...
Logs on node3,
1251:X 18 Apr 10:31:39.046 # +sdown master redis-cluster 192.168.0.21 6379
1251:X 18 Apr 10:31:39.046 # +sdown sentinel df750df382c4ed320a8b8e403e2d26244a93c103 192.168.0.21 26379 @ redis-cluster 192.168.0.21 6379
# hangs up here for 16 minutes
1251:X 18 Apr 10:47:51.686 # +new-epoch 19
1251:X 18 Apr 10:47:51.718 # +vote-for-leader 8d71b4963c1d49a62f10867c6c87dfa673abdfae 19
1251:X 18 Apr 10:47:57.721 # +odown master redis-cluster 192.168.0.21 6379 #quorum 2/2
1251:X 18 Apr 10:47:57.721 # Next failover delay: I will not start a failover before Wed Apr 18 10:48:12 2018
...
It appears some network related call in sentinelHandleRedisInstance(sentinelRedisInstance *ri) may get stuck for some reasons.
Comment From: patpatbear
Got the same issue
Comment From: yossigo
@patpatbear With the same version or newer? v3.2.9 is really too old, but if this is a reproducible issue with newer versions it deserves attention of course.
Comment From: patpatbear
@yossigo actually got it with older version. sorry for the bother, it's the same isssue as #2819, fixed in commit-643110525, I suggest close this issue (and duplicated isuse #7104).
Comment From: yossigo
Closing, thank you @patpatbear