Description
sentinel only publish +sdown/+odown message once
To reproduce
1 master with 2 replicas and 3 sentinels. They're all running on kubenetes. The master and one sentinel crashed, mean time the remaining 2 sentinels lost network connection for a minute. The remaining 2 sentinels set master status as +sdown, however they could not reach a consensus (of master unreachable) because of network problem. Once sentinel can not reach master, it publishes a "master +odown" message to sentinels, it appends a SRI_O_DOWN and does not publish that message again.
Expected behavior
It's better for the sentinel to continuously send +sdown/+odown message to other sentinels in order that others temporarily go offline can also get the message when they recovered.
Additional information
Any additional information that is relevant to the problem.
Comment From: sunxiao2010n
Error behavior caused by the issue
sentinel-0:
1:X 21 Aug 2024 16:19:15.621 # Failed to resolve hostname 'redis-persistent-master-0.redis-persistent-master-svc.service-software' 1:X 21 Aug 2024 16:19:15.621 # +sdown master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 16:19:25.633 # Failed to resolve hostname 'redis-persistent-sentinel-1.redis-persistent-sentinel-svc.service-software' 1:X 21 Aug 2024 16:19:25.634 # +sdown sentinel 89f6ad0dbb453989e9071edd6ada4ec15ac66c09 redis-persistent-sentinel-1.redis-persistent-sentinel-svc.service-software 26379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 16:19:25.634 # +sdown sentinel e0e60aafa5fad733a7176e2d4c28d0f6d9106cf5 redis-persistent-sentinel-2.redis-persistent-sentinel-svc.service-software 26379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 16:19:45.655 # +tilt #tilt mode entered 1:X 21 Aug 2024 16:20:05.673 # Failed to resolve hostname 'redis-persistent-sentinel-2.redis-persistent-sentinel-svc.service-software' 1:X 21 Aug 2024 16:20:05.763 # +tilt #tilt mode entered 1:X 21 Aug 2024 16:20:35.821 # -tilt #tilt mode exited 1:X 21 Aug 2024 16:20:59.379 # +sdown slave redis-persistent-master-2.redis-persistent-master-svc.service-software:6379 redis-persistent-master-2.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 16:21:16.333 # -sdown slave redis-persistent-master-2.redis-persistent-master-svc.service-software:6379 redis-persistent-master-2.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 16:21:20.529 # +sdown slave redis-persistent-master-2.redis-persistent-master-svc.service-software:6379 redis-persistent-master-2.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379
sentinel-0 stuck at '+sdown master' status
sentinel-2: is a little complex
1:X 21 Aug 2024 19:38:16.572 # +sdown master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 19:38:16.573 # +sdown slave redis-persistent-master-2.redis-persistent-master-svc.service-software:6379 redis-persistent-master-2.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 19:38:16.573 # +sdown sentinel 89f6ad0dbb453989e9071edd6ada4ec15ac66c09 redis-persistent-sentinel-1.redis-persistent-sentinel-svc.service-software 26379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 19:38:26.642 # Failed to resolve hostname 'redis-persistent-master-0.redis-persistent-master-svc.service-software' 1:X 21 Aug 2024 19:38:26.645 # +sdown slave redis-persistent-master-1.redis-persistent-master-svc.service-software:6379 redis-persistent-master-1.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 19:38:36.658 # Failed to resolve hostname 'redis-persistent-sentinel-1.redis-persistent-sentinel-svc.service-software' 1:X 21 Aug 2024 19:38:36.741 # +tilt #tilt mode entered 1:X 21 Aug 2024 19:39:06.803 # -tilt #tilt mode exited 1:X 21 Aug 2024 19:39:16.880 # Failed to resolve hostname 'redis-persistent-master-0.redis-persistent-master-svc.service-software' 1:X 21 Aug 2024 19:39:26.895 # Failed to resolve hostname 'redis-persistent-sentinel-1.redis-persistent-sentinel-svc.service-software' 1:X 21 Aug 2024 19:39:26.951 # +tilt #tilt mode entered 1:X 21 Aug 2024 19:39:57.299 # Failed to resolve hostname 'redis-persistent-sentinel-0.redis-persistent-sentinel-svc.service-software' 1:X 21 Aug 2024 19:39:57.299 # +tilt #tilt mode entered 1:X 21 Aug 2024 19:40:27.461 # Failed to resolve hostname 'redis-persistent-master-0.redis-persistent-master-svc.service-software' 1:X 21 Aug 2024 19:40:27.461 # +tilt #tilt mode entered ...... 1:X 21 Aug 2024 19:48:50.622 # -sdown sentinel 8bb49690237e784036635ce1e9c533a66e3a54ed redis-persistent-sentinel-0.redis-persistent-sentinel-svc.service-software 26379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 19:49:00.699 # Failed to resolve hostname 'redis-persistent-master-0.redis-persistent-master-svc.service-software' 1:X 21 Aug 2024 19:49:00.701 # +sdown slave redis-persistent-master-1.redis-persistent-master-svc.service-software:6379 redis-persistent-master-1.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 21 Aug 2024 19:49:10.708 # Failed to resolve hostname 'redis-persistent-sentinel-1.redis-persistent-sentinel-svc.service-software' 1:X 21 Aug 2024 19:49:10.779 # +tilt #tilt mode entered 1:X 21 Aug 2024 19:49:40.831 # -tilt #tilt mode exited
Initially, when sentinel-2 observed master sdown, it temporarily could not connect to sentinel-0. After sentinel-2 reconnected to sentinel-0, sentinel-2 did not re-send the sdown message it had observed about the master.
Comment From: sunxiao2010n
The expected behavior
sentinel-0 reaches a consensus with sentinel-2 to perform a failover operation.
1:X 27 Aug 2024 16:15:07.613 # +monitor master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 quorum 2 1:X 27 Aug 2024 16:15:25.914 # +sdown master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 27 Aug 2024 16:15:25.978 # +odown master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 #quorum 2/2 1:X 27 Aug 2024 16:15:25.978 # +new-epoch 1 1:X 27 Aug 2024 16:15:25.978 # +try-failover master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 27 Aug 2024 16:15:25.981 # +vote-for-leader 760b82fcd780cdf51c92d25cd37d590fea76eeaf 1
Comment From: sunxiao2010n
1:X 22 Aug 2024 14:49:13.391 # +new-epoch 9 1:X 22 Aug 2024 14:49:13.391 # +try-failover master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 22 Aug 2024 14:49:13.393 # +vote-for-leader 8bb49690237e784036635ce1e9c533a66e3a54ed 9 1:X 22 Aug 2024 14:49:13.393 # -sdown slave redis-persistent-master-2.redis-persistent-master-svc.service-software:6379 redis-persistent-master-2.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 22 Aug 2024 14:49:13.393 # -sdown slave redis-persistent-master-1.redis-persistent-master-svc.service-software:6379 redis-persistent-master-1.redis-persistent-master-svc.service-software 6379 @ mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 22 Aug 2024 14:49:23.461 # Failed to resolve hostname 'redis-persistent-master-0.redis-persistent-master-svc.service-software' 1:X 22 Aug 2024 14:49:23.461 # -odown master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379 1:X 22 Aug 2024 14:49:23.461 # -failover-abort-not-elected master mymaster redis-persistent-master-0.redis-persistent-master-svc.service-software 6379
I'm sorry. maybe that's just a physical connection failure