redis version: Redis server v=6.2.1 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=e7c5360d7cd8d0d9

start server:

master: 127.0.0.1:10001 salve0: 127.0.0.1:10002 salve1: 127.0.0.1:10003

sentinel0: 127.0.0.1:20001 mymaster sentinel1: 127.0.0.1:20002 mymaster sentinel2: 127.0.0.1:20003 mymaster

example: 1. connect sentinel0/1/2 2. send cmd: sentinel get-master-addr-by-name mymaster (return 127.0.0.1:10001) 3. send cmd: subscribe +switch-master 4. kill master(127.0.0.1:10001) 5. wait... 6. subscribe +switch-master return new master: 127.0.0.1:10002 7.

redis-cli -h 127.0.0.1 -p 10002
127.0.0.1:10002> info Replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:5a30976dbb09844b27176438e3f8f4ad400e3c80
master_replid2:3cd2769d958354db33c6c60c49118fcaf216d4e3
master_repl_offset:2111
second_repl_offset:1920
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:2111
  1. start master(127.0.0.1:10001)
  2. about 1 second later, subscribe +switch-master return new master: 127.0.0.1:10003

explain: the probability of occurrence of 7(connected_slaves: 0) and 9 is less than 1%. the client i use is https://github.com/go-redis/redis. i don't understand why this happens...why did the sentinel repeatedly send me different masters?

Comment From: yossigo

@monkey92t Can you provide the logs from the sentinel instances?

Comment From: monkey92t

@monkey92t Can you provide the logs from the sentinel instances?

thanks for your reply, I need some time to provide data, I try to make a tool that can reproduce the error

Comment From: monkey92t

@monkey92t Can you provide the logs from the sentinel instances?

@yossigo hi, I have obtained the sentinel log, the sentinel I am currently connected to is 127.0.0.1:20002(sub monitor host), I will provide all (3) sentinel logs:

NO1: 127.0.0.1:20001

10718:X 08 Mar 2021 08:21:54.170 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10718:X 08 Mar 2021 08:21:54.170 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=10718, just started
10718:X 08 Mar 2021 08:21:54.170 # Configuration loaded
10718:X 08 Mar 2021 08:21:54.171 * monotonic clock: POSIX clock_gettime
10718:X 08 Mar 2021 08:21:54.173 * Running mode=sentinel, port=20001.
10718:X 08 Mar 2021 08:21:54.173 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10718:X 08 Mar 2021 08:21:54.177 # Sentinel ID is fe1df7a441c3a14e07d090121d440e409bc0d585
10718:X 08 Mar 2021 08:21:54.474 # +monitor master mymaster 127.0.0.1 10001 quorum 2
10718:X 08 Mar 2021 08:21:54.474 # +set master mymaster 127.0.0.1 10001 down-after-milliseconds 500
10718:X 08 Mar 2021 08:21:54.477 # +set master mymaster 127.0.0.1 10001 failover-timeout 1000
10718:X 08 Mar 2021 08:21:54.480 # +set master mymaster 127.0.0.1 10001 parallel-syncs 1
10718:X 08 Mar 2021 08:21:54.522 * +slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10001
10718:X 08 Mar 2021 08:21:54.527 * +slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10001
10718:X 08 Mar 2021 08:21:56.802 * +sentinel sentinel fd2b2b428d387814773d52391b6ca4699d433e9d 127.0.0.1 20002 @ mymaster 127.0.0.1 10001
10718:X 08 Mar 2021 08:21:57.136 * +sentinel sentinel 82a034665363b8a363b956bbd910d4912c15a96a 127.0.0.1 20003 @ mymaster 127.0.0.1 10001
10718:X 08 Mar 2021 08:21:57.749 # +sdown master mymaster 127.0.0.1 10001
10718:X 08 Mar 2021 08:21:57.831 # +new-epoch 1
10718:X 08 Mar 2021 08:21:57.833 # +vote-for-leader fd2b2b428d387814773d52391b6ca4699d433e9d 1
10718:X 08 Mar 2021 08:21:57.892 # +odown master mymaster 127.0.0.1 10001 #quorum 2/2
10718:X 08 Mar 2021 08:21:57.892 # Next failover delay: I will not start a failover before Mon Mar  8 08:22:00 2021
10718:X 08 Mar 2021 08:21:58.888 # +config-update-from sentinel fd2b2b428d387814773d52391b6ca4699d433e9d 127.0.0.1 20002 @ mymaster 127.0.0.1 10001
10718:X 08 Mar 2021 08:21:58.888 # +switch-master mymaster 127.0.0.1 10001 127.0.0.1 10002
10718:X 08 Mar 2021 08:21:58.888 * +slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10002
10718:X 08 Mar 2021 08:21:58.888 * +slave slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10002
10718:X 08 Mar 2021 08:21:59.407 # +new-epoch 2
10718:X 08 Mar 2021 08:21:59.409 # +vote-for-leader 82a034665363b8a363b956bbd910d4912c15a96a 2
10718:X 08 Mar 2021 08:21:59.450 # +sdown slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10002
10718:X 08 Mar 2021 08:22:00.014 # +config-update-from sentinel 82a034665363b8a363b956bbd910d4912c15a96a 127.0.0.1 20003 @ mymaster 127.0.0.1 10002
10718:X 08 Mar 2021 08:22:00.014 # +switch-master mymaster 127.0.0.1 10002 127.0.0.1 10003
10718:X 08 Mar 2021 08:22:00.014 * +slave slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10003
10718:X 08 Mar 2021 08:22:00.014 * +slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10003
10718:X 08 Mar 2021 08:22:00.584 # +sdown slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10003
10718:X 08 Mar 2021 08:22:10.060 * +convert-to-slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10003

NO2: 127.0.0.1:20002 (sub monitor host)

10723:X 08 Mar 2021 08:21:54.486 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10723:X 08 Mar 2021 08:21:54.486 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=10723, just started
10723:X 08 Mar 2021 08:21:54.486 # Configuration loaded
10723:X 08 Mar 2021 08:21:54.487 * monotonic clock: POSIX clock_gettime
10723:X 08 Mar 2021 08:21:54.487 * Running mode=sentinel, port=20002.
10723:X 08 Mar 2021 08:21:54.487 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10723:X 08 Mar 2021 08:21:54.490 # Sentinel ID is fd2b2b428d387814773d52391b6ca4699d433e9d
10723:X 08 Mar 2021 08:21:54.789 # +monitor master mymaster 127.0.0.1 10001 quorum 2
10723:X 08 Mar 2021 08:21:54.790 # +set master mymaster 127.0.0.1 10001 down-after-milliseconds 500
10723:X 08 Mar 2021 08:21:54.792 # +set master mymaster 127.0.0.1 10001 failover-timeout 1000
10723:X 08 Mar 2021 08:21:54.794 # +set master mymaster 127.0.0.1 10001 parallel-syncs 1
10723:X 08 Mar 2021 08:21:54.801 * +slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:54.803 * +slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:56.513 * +sentinel sentinel fe1df7a441c3a14e07d090121d440e409bc0d585 127.0.0.1 20001 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:57.136 * +sentinel sentinel 82a034665363b8a363b956bbd910d4912c15a96a 127.0.0.1 20003 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:57.771 # +sdown master mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:57.823 # +odown master mymaster 127.0.0.1 10001 #quorum 2/2
10723:X 08 Mar 2021 08:21:57.824 # +new-epoch 1
10723:X 08 Mar 2021 08:21:57.824 # +try-failover master mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:57.826 # +vote-for-leader fd2b2b428d387814773d52391b6ca4699d433e9d 1
10723:X 08 Mar 2021 08:21:57.834 # 82a034665363b8a363b956bbd910d4912c15a96a voted for fd2b2b428d387814773d52391b6ca4699d433e9d 1
10723:X 08 Mar 2021 08:21:57.834 # fe1df7a441c3a14e07d090121d440e409bc0d585 voted for fd2b2b428d387814773d52391b6ca4699d433e9d 1
10723:X 08 Mar 2021 08:21:57.893 # +elected-leader master mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:57.893 # +failover-state-select-slave master mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:57.993 # +selected-slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:57.994 * +failover-state-send-slaveof-noone slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:58.053 * +failover-state-wait-promotion slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:58.840 # +promoted-slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:58.840 # +failover-state-reconf-slaves master mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:58.886 * +slave-reconf-sent slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:59.498 # +new-epoch 2
10723:X 08 Mar 2021 08:21:59.638 # +sdown slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:59.728 # +failover-end master mymaster 127.0.0.1 10001
10723:X 08 Mar 2021 08:21:59.728 # +switch-master mymaster 127.0.0.1 10001 127.0.0.1 10002
10723:X 08 Mar 2021 08:21:59.729 * +slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10002
10723:X 08 Mar 2021 08:21:59.729 * +slave slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10002
10723:X 08 Mar 2021 08:22:00.015 # +config-update-from sentinel 82a034665363b8a363b956bbd910d4912c15a96a 127.0.0.1 20003 @ mymaster 127.0.0.1 10002
10723:X 08 Mar 2021 08:22:00.015 # +switch-master mymaster 127.0.0.1 10002 127.0.0.1 10003
10723:X 08 Mar 2021 08:22:00.016 * +slave slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10003
10723:X 08 Mar 2021 08:22:00.016 * +slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10003
10723:X 08 Mar 2021 08:22:00.595 # +sdown slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10003

NO3: 127.0.0.1:20003

10728:X 08 Mar 2021 08:21:54.801 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10728:X 08 Mar 2021 08:21:54.801 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=10728, just started
10728:X 08 Mar 2021 08:21:54.801 # Configuration loaded
10728:X 08 Mar 2021 08:21:54.802 * monotonic clock: POSIX clock_gettime
10728:X 08 Mar 2021 08:21:54.803 * Running mode=sentinel, port=20003.
10728:X 08 Mar 2021 08:21:54.803 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10728:X 08 Mar 2021 08:21:54.806 # Sentinel ID is 82a034665363b8a363b956bbd910d4912c15a96a
10728:X 08 Mar 2021 08:21:55.104 # +monitor master mymaster 127.0.0.1 10001 quorum 2
10728:X 08 Mar 2021 08:21:55.104 # +set master mymaster 127.0.0.1 10001 down-after-milliseconds 500
10728:X 08 Mar 2021 08:21:55.107 # +set master mymaster 127.0.0.1 10001 failover-timeout 1000
10728:X 08 Mar 2021 08:21:55.110 # +set master mymaster 127.0.0.1 10001 parallel-syncs 1
10728:X 08 Mar 2021 08:21:55.117 * +slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10001
10728:X 08 Mar 2021 08:21:55.120 * +slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10001
10728:X 08 Mar 2021 08:21:56.513 * +sentinel sentinel fe1df7a441c3a14e07d090121d440e409bc0d585 127.0.0.1 20001 @ mymaster 127.0.0.1 10001
10728:X 08 Mar 2021 08:21:56.802 * +sentinel sentinel fd2b2b428d387814773d52391b6ca4699d433e9d 127.0.0.1 20002 @ mymaster 127.0.0.1 10001
10728:X 08 Mar 2021 08:21:57.830 # +new-epoch 1
10728:X 08 Mar 2021 08:21:57.833 # +vote-for-leader fd2b2b428d387814773d52391b6ca4699d433e9d 1
10728:X 08 Mar 2021 08:21:57.844 # +sdown master mymaster 127.0.0.1 10001
10728:X 08 Mar 2021 08:21:57.911 # +odown master mymaster 127.0.0.1 10001 #quorum 3/2
10728:X 08 Mar 2021 08:21:57.911 # Next failover delay: I will not start a failover before Mon Mar  8 08:22:00 2021
10728:X 08 Mar 2021 08:21:58.888 # +config-update-from sentinel fd2b2b428d387814773d52391b6ca4699d433e9d 127.0.0.1 20002 @ mymaster 127.0.0.1 10001
10728:X 08 Mar 2021 08:21:58.888 # +switch-master mymaster 127.0.0.1 10001 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:58.888 * +slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:58.888 * +slave slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.400 # +sdown master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.400 # +odown master mymaster 127.0.0.1 10002 #quorum 3/2
10728:X 08 Mar 2021 08:21:59.400 # +new-epoch 2
10728:X 08 Mar 2021 08:21:59.400 # +try-failover master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.403 # +vote-for-leader 82a034665363b8a363b956bbd910d4912c15a96a 2
10728:X 08 Mar 2021 08:21:59.404 # +sdown slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.409 # fe1df7a441c3a14e07d090121d440e409bc0d585 voted for 82a034665363b8a363b956bbd910d4912c15a96a 2
10728:X 08 Mar 2021 08:21:59.494 # -odown master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.494 # +elected-leader master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.494 # +failover-state-select-slave master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.561 # -sdown master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.561 # +selected-slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.561 * +failover-state-send-slaveof-noone slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.633 * +failover-state-wait-promotion slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.951 # +promoted-slave slave 127.0.0.1:10003 127.0.0.1 10003 @ mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:21:59.951 # +failover-state-reconf-slaves master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:22:00.010 # +failover-end master mymaster 127.0.0.1 10002
10728:X 08 Mar 2021 08:22:00.010 # +switch-master mymaster 127.0.0.1 10002 127.0.0.1 10003
10728:X 08 Mar 2021 08:22:00.010 * +slave slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10003
10728:X 08 Mar 2021 08:22:00.010 * +slave slave 127.0.0.1:10002 127.0.0.1 10002 @ mymaster 127.0.0.1 10003
10728:X 08 Mar 2021 08:22:00.562 # +sdown slave 127.0.0.1:10001 127.0.0.1 10001 @ mymaster 127.0.0.1 10003

Last active host:

[root@e0662fb6906b opt]# ps aux | grep redis
root     10701  0.5  0.2 139204  4664 pts/0    Sl+  08:21   0:00 /opt/redis/src/redis-server *:10002
root     10710  0.5  0.2 139204  4660 pts/0    Sl+  08:21   0:00 /opt/redis/src/redis-server *:10003
root     10718  0.9  0.2  58816  4700 pts/0    Sl+  08:21   0:00 /opt/redis/src/redis-server *:20001 [sentinel]
root     10723  1.0  0.2  58816  4756 pts/0    Sl+  08:21   0:00 /opt/redis/src/redis-server *:20002 [sentinel]
root     10728  0.9  0.2  58816  4804 pts/0    Sl+  08:21   0:00 /opt/redis/src/redis-server *:20003 [sentinel]
root     10753  0.0  0.0   9104   880 pts/1    S+   08:22   0:00 grep --color=auto redis

I made a docker image, this error can appear, if you need, I can provide.

Comment From: monkey92t

This is the log provided by my tool

sentinel_test.go:43: =======RUN
    sentinel_test.go:238: [startRedis] start redis server: port-10001
    sentinel_test.go:238: [startRedis] start redis server: port-10002
    sentinel_test.go:238: [startRedis] start redis server: port-10003
    sentinel_test.go:265: [startSentinel] start redis sentinel: port-20001, name-mymaster, masterPort-10001
    sentinel_test.go:265: [startSentinel] start redis sentinel: port-20002, name-mymaster, masterPort-10001
    sentinel_test.go:265: [startSentinel] start redis sentinel: port-20003, name-mymaster, masterPort-10001
    sentinel_test.go:112: [sentinel] getMasterAddr, initial master addr: "127.0.0.1:10001"
    sentinel_test.go:114: [sentinel] send cmd: sub +switch-master
    sentinel_test.go:150: [sentinel] Kill Master
    sentinel_test.go:157: [sentinel] Slave0 Active
    sentinel_test.go:162: [sentinel] Slave1 Active
    sentinel_test.go:140: [sentinel] sub +switch-master, return: "127.0.0.1:10002"
    sentinel_test.go:136: [sentinel] ERROR repeat sub +switch-master, return: "127.0.0.1:10003"
    sentinel_test.go:165: [sentinel] end...
    sentinel_test.go:50: HIT.................
--- PASS: TestSentinel (234.39s)

Comment From: monkey92t

redis-server log:

NO1: 127.0.0.1:10001(Master)

10653:C 08 Mar 2021 08:21:44.461 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=10653, just started
10653:C 08 Mar 2021 08:21:44.461 # Configuration loaded
10653:M 08 Mar 2021 08:21:44.462 * monotonic clock: POSIX clock_gettime
10653:M 08 Mar 2021 08:21:44.462 * Running mode=standalone, port=10001.
10653:M 08 Mar 2021 08:21:44.462 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10653:M 08 Mar 2021 08:21:44.462 # Server initialized
10653:M 08 Mar 2021 08:21:44.462 * Ready to accept connections
10653:M 08 Mar 2021 08:21:44.768 * Replica 127.0.0.1:10002 asks for synchronization
10653:M 08 Mar 2021 08:21:44.768 * Full resync requested by replica 127.0.0.1:10002
10653:M 08 Mar 2021 08:21:44.768 * Replication backlog created, my new replication IDs are 'bfb2f5bbdad85f959f789146993685851d5a852b' and '0000000000000000000000000000000000000000'
10653:M 08 Mar 2021 08:21:44.768 * Starting BGSAVE for SYNC with target: disk
10653:M 08 Mar 2021 08:21:44.769 * Background saving started by pid 10664
10664:C 08 Mar 2021 08:21:44.772 * DB saved on disk
10664:C 08 Mar 2021 08:21:44.772 * RDB: 0 MB of memory used by copy-on-write
10653:M 08 Mar 2021 08:21:44.864 * Background saving terminated with success
10653:M 08 Mar 2021 08:21:44.865 * Synchronization with replica 127.0.0.1:10002 succeeded
10653:M 08 Mar 2021 08:21:45.076 * Replica 127.0.0.1:10003 asks for synchronization
10653:M 08 Mar 2021 08:21:45.076 * Full resync requested by replica 127.0.0.1:10003
10653:M 08 Mar 2021 08:21:45.076 * Starting BGSAVE for SYNC with target: disk
10653:M 08 Mar 2021 08:21:45.076 * Background saving started by pid 10673
10673:C 08 Mar 2021 08:21:45.079 * DB saved on disk
10673:C 08 Mar 2021 08:21:45.079 * RDB: 0 MB of memory used by copy-on-write
10653:M 08 Mar 2021 08:21:45.165 * Background saving terminated with success
10653:M 08 Mar 2021 08:21:45.165 * Synchronization with replica 127.0.0.1:10003 succeeded
10695:C 08 Mar 2021 08:21:53.256 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10695:C 08 Mar 2021 08:21:53.256 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=10695, just started
10695:C 08 Mar 2021 08:21:53.256 # Configuration loaded
10695:M 08 Mar 2021 08:21:53.256 * monotonic clock: POSIX clock_gettime
10695:M 08 Mar 2021 08:21:53.257 * Running mode=standalone, port=10001.
10695:M 08 Mar 2021 08:21:53.257 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10695:M 08 Mar 2021 08:21:53.257 # Server initialized
10695:M 08 Mar 2021 08:21:53.257 * Ready to accept connections
10695:M 08 Mar 2021 08:21:53.563 * Replica 127.0.0.1:10002 asks for synchronization
10695:M 08 Mar 2021 08:21:53.563 * Full resync requested by replica 127.0.0.1:10002
10695:M 08 Mar 2021 08:21:53.563 * Replication backlog created, my new replication IDs are '77a55ec4e95ace44d1d5396e43d2e3c5e396a116' and '0000000000000000000000000000000000000000'
10695:M 08 Mar 2021 08:21:53.563 * Starting BGSAVE for SYNC with target: disk
10695:M 08 Mar 2021 08:21:53.563 * Background saving started by pid 10706
10706:C 08 Mar 2021 08:21:53.567 * DB saved on disk
10706:C 08 Mar 2021 08:21:53.567 * RDB: 0 MB of memory used by copy-on-write
10695:M 08 Mar 2021 08:21:53.660 * Background saving terminated with success
10695:M 08 Mar 2021 08:21:53.660 * Synchronization with replica 127.0.0.1:10002 succeeded
10695:M 08 Mar 2021 08:21:53.872 * Replica 127.0.0.1:10003 asks for synchronization
10695:M 08 Mar 2021 08:21:53.872 * Full resync requested by replica 127.0.0.1:10003
10695:M 08 Mar 2021 08:21:53.872 * Starting BGSAVE for SYNC with target: disk
10695:M 08 Mar 2021 08:21:53.873 * Background saving started by pid 10715
10715:C 08 Mar 2021 08:21:53.875 * DB saved on disk
10715:C 08 Mar 2021 08:21:53.875 * RDB: 0 MB of memory used by copy-on-write
10695:M 08 Mar 2021 08:21:53.961 * Background saving terminated with success
10695:M 08 Mar 2021 08:21:53.961 * Synchronization with replica 127.0.0.1:10003 succeeded

NO2: 127.0.0.1:10002(Slave0)

10701:C 08 Mar 2021 08:21:53.559 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10701:C 08 Mar 2021 08:21:53.559 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=10701, just started
10701:C 08 Mar 2021 08:21:53.559 # Configuration loaded
10701:S 08 Mar 2021 08:21:53.560 * monotonic clock: POSIX clock_gettime
10701:S 08 Mar 2021 08:21:53.560 * Running mode=standalone, port=10002.
10701:S 08 Mar 2021 08:21:53.560 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10701:S 08 Mar 2021 08:21:53.561 # Server initialized
10701:S 08 Mar 2021 08:21:53.561 * Ready to accept connections
10701:S 08 Mar 2021 08:21:53.561 * Connecting to MASTER 127.0.0.1:10001
10701:S 08 Mar 2021 08:21:53.561 * MASTER <-> REPLICA sync started
10701:S 08 Mar 2021 08:21:53.561 * Non blocking connect for SYNC fired the event.
10701:S 08 Mar 2021 08:21:53.562 * Master replied to PING, replication can continue...
10701:S 08 Mar 2021 08:21:53.562 * Partial resynchronization not possible (no cached master)
10701:S 08 Mar 2021 08:21:53.564 * Full resync from master: 77a55ec4e95ace44d1d5396e43d2e3c5e396a116:0
10701:S 08 Mar 2021 08:21:53.660 * MASTER <-> REPLICA sync: receiving 175 bytes from master to disk
10701:S 08 Mar 2021 08:21:53.660 * MASTER <-> REPLICA sync: Flushing old data
10701:S 08 Mar 2021 08:21:53.661 * MASTER <-> REPLICA sync: Loading DB in memory
10701:S 08 Mar 2021 08:21:53.662 * Loading RDB produced by version 6.2.1
10701:S 08 Mar 2021 08:21:53.662 * RDB age 0 seconds
10701:S 08 Mar 2021 08:21:53.663 * RDB memory usage when created 1.40 Mb
10701:S 08 Mar 2021 08:21:53.663 * MASTER <-> REPLICA sync: Finished with success
10701:S 08 Mar 2021 08:21:53.663 * Background append only file rewriting started by pid 10707
10701:S 08 Mar 2021 08:21:53.691 * AOF rewrite child asks to stop sending diffs.
10707:C 08 Mar 2021 08:21:53.691 * Parent agreed to stop sending diffs. Finalizing AOF...
10707:C 08 Mar 2021 08:21:53.691 * Concatenating 0.00 MB of AOF diff received from parent.
10707:C 08 Mar 2021 08:21:53.692 * SYNC append only file rewrite performed
10707:C 08 Mar 2021 08:21:53.692 * AOF rewrite: 0 MB of memory used by copy-on-write
10701:S 08 Mar 2021 08:21:53.764 * Background AOF rewrite terminated with success
10701:S 08 Mar 2021 08:21:53.764 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
10701:S 08 Mar 2021 08:21:53.764 * Background AOF rewrite finished successfully
10701:S 08 Mar 2021 08:21:57.234 # Connection with master lost.
10701:S 08 Mar 2021 08:21:57.234 * Caching the disconnected master state.
10701:S 08 Mar 2021 08:21:57.234 * Reconnecting to MASTER 127.0.0.1:10001
10701:S 08 Mar 2021 08:21:57.234 * MASTER <-> REPLICA sync started
10701:S 08 Mar 2021 08:21:57.235 # Error condition on socket for SYNC: Connection refused
10701:S 08 Mar 2021 08:21:57.575 * Connecting to MASTER 127.0.0.1:10001
10701:S 08 Mar 2021 08:21:57.575 * MASTER <-> REPLICA sync started
10701:S 08 Mar 2021 08:21:57.575 # Error condition on socket for SYNC: Connection refused
10701:M 08 Mar 2021 08:21:58.053 * Discarding previously cached master state.
10701:M 08 Mar 2021 08:21:58.053 # Setting secondary replication ID to 77a55ec4e95ace44d1d5396e43d2e3c5e396a116, valid up to offset: 426. New replication ID is 661bb3ee9eb04fc4a108a3089e7f8c1b7e8a2ffa
10701:M 08 Mar 2021 08:21:58.053 * MASTER MODE enabled (user request from 'id=7 addr=127.0.0.1:44316 laddr=127.0.0.1:10002 fd=12 name=sentinel-fd2b2b42-cmd age=4 idle=1 flags=x db=0 sub=0 psub=0 multi=4 qbuf=188 qbuf-free=40766 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1')
10701:M 08 Mar 2021 08:21:58.056 # CONFIG REWRITE executed with success.
10701:M 08 Mar 2021 08:21:58.891 * Replica 127.0.0.1:10003 asks for synchronization
10701:M 08 Mar 2021 08:21:58.891 * Partial resynchronization request from 127.0.0.1:10003 accepted. Sending 291 bytes of backlog starting from offset 426.
10701:M 08 Mar 2021 08:21:59.633 # Connection with replica 127.0.0.1:10003 lost.
10701:S 08 Mar 2021 08:22:10.060 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
10701:S 08 Mar 2021 08:22:10.060 * Connecting to MASTER 127.0.0.1:10003
10701:S 08 Mar 2021 08:22:10.060 * MASTER <-> REPLICA sync started
10701:S 08 Mar 2021 08:22:10.060 * REPLICAOF 127.0.0.1:10003 enabled (user request from 'id=21 addr=127.0.0.1:44564 laddr=127.0.0.1:10002 fd=8 name=sentinel-fe1df7a4-cmd age=11 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=197 qbuf-free=40757 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1')
10701:S 08 Mar 2021 08:22:10.062 # CONFIG REWRITE executed with success.
10701:S 08 Mar 2021 08:22:10.062 * Non blocking connect for SYNC fired the event.
10701:S 08 Mar 2021 08:22:10.062 * Master replied to PING, replication can continue...
10701:S 08 Mar 2021 08:22:10.063 * Trying a partial resynchronization (request 661bb3ee9eb04fc4a108a3089e7f8c1b7e8a2ffa:2593).
10701:S 08 Mar 2021 08:22:10.064 * Full resync from master: 217c983e6f43e2fe66b15672bf232a9ab760d433:3017
10701:S 08 Mar 2021 08:22:10.064 * Discarding previously cached master state.
10701:S 08 Mar 2021 08:22:10.111 * MASTER <-> REPLICA sync: receiving 176 bytes from master to disk
10701:S 08 Mar 2021 08:22:10.111 * MASTER <-> REPLICA sync: Flushing old data
10701:S 08 Mar 2021 08:22:10.111 * MASTER <-> REPLICA sync: Loading DB in memory
10701:S 08 Mar 2021 08:22:10.112 * Loading RDB produced by version 6.2.1
10701:S 08 Mar 2021 08:22:10.112 * RDB age 0 seconds
10701:S 08 Mar 2021 08:22:10.112 * RDB memory usage when created 1.50 Mb
10701:S 08 Mar 2021 08:22:10.112 * MASTER <-> REPLICA sync: Finished with success
10701:S 08 Mar 2021 08:22:10.113 * Background append only file rewriting started by pid 10734
10701:S 08 Mar 2021 08:22:10.143 * AOF rewrite child asks to stop sending diffs.
10734:C 08 Mar 2021 08:22:10.143 * Parent agreed to stop sending diffs. Finalizing AOF...
10734:C 08 Mar 2021 08:22:10.143 * Concatenating 0.00 MB of AOF diff received from parent.
10734:C 08 Mar 2021 08:22:10.144 * SYNC append only file rewrite performed
10734:C 08 Mar 2021 08:22:10.144 * AOF rewrite: 0 MB of memory used by copy-on-write
10701:S 08 Mar 2021 08:22:10.211 * Background AOF rewrite terminated with success
10701:S 08 Mar 2021 08:22:10.211 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
10701:S 08 Mar 2021 08:22:10.211 * Background AOF rewrite finished successfully

NO3: 127.0.0.1:10003(Slave1)

10710:C 08 Mar 2021 08:21:53.869 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10710:C 08 Mar 2021 08:21:53.869 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=10710, just started
10710:C 08 Mar 2021 08:21:53.869 # Configuration loaded
10710:S 08 Mar 2021 08:21:53.869 * monotonic clock: POSIX clock_gettime
10710:S 08 Mar 2021 08:21:53.870 * Running mode=standalone, port=10003.
10710:S 08 Mar 2021 08:21:53.870 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10710:S 08 Mar 2021 08:21:53.870 # Server initialized
10710:S 08 Mar 2021 08:21:53.871 * Ready to accept connections
10710:S 08 Mar 2021 08:21:53.871 * Connecting to MASTER 127.0.0.1:10001
10710:S 08 Mar 2021 08:21:53.872 * MASTER <-> REPLICA sync started
10710:S 08 Mar 2021 08:21:53.872 * Non blocking connect for SYNC fired the event.
10710:S 08 Mar 2021 08:21:53.872 * Master replied to PING, replication can continue...
10710:S 08 Mar 2021 08:21:53.872 * Partial resynchronization not possible (no cached master)
10710:S 08 Mar 2021 08:21:53.873 * Full resync from master: 77a55ec4e95ace44d1d5396e43d2e3c5e396a116:0
10710:S 08 Mar 2021 08:21:53.961 * MASTER <-> REPLICA sync: receiving 175 bytes from master to disk
10710:S 08 Mar 2021 08:21:53.961 * MASTER <-> REPLICA sync: Flushing old data
10710:S 08 Mar 2021 08:21:53.962 * MASTER <-> REPLICA sync: Loading DB in memory
10710:S 08 Mar 2021 08:21:53.963 * Loading RDB produced by version 6.2.1
10710:S 08 Mar 2021 08:21:53.963 * RDB age 0 seconds
10710:S 08 Mar 2021 08:21:53.963 * RDB memory usage when created 1.42 Mb
10710:S 08 Mar 2021 08:21:53.963 * MASTER <-> REPLICA sync: Finished with success
10710:S 08 Mar 2021 08:21:53.964 * Background append only file rewriting started by pid 10716
10710:S 08 Mar 2021 08:21:53.994 * AOF rewrite child asks to stop sending diffs.
10716:C 08 Mar 2021 08:21:53.994 * Parent agreed to stop sending diffs. Finalizing AOF...
10716:C 08 Mar 2021 08:21:53.994 * Concatenating 0.00 MB of AOF diff received from parent.
10716:C 08 Mar 2021 08:21:53.994 * SYNC append only file rewrite performed
10716:C 08 Mar 2021 08:21:53.995 * AOF rewrite: 0 MB of memory used by copy-on-write
10710:S 08 Mar 2021 08:21:54.072 * Background AOF rewrite terminated with success
10710:S 08 Mar 2021 08:21:54.072 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
10710:S 08 Mar 2021 08:21:54.072 * Background AOF rewrite finished successfully
10710:S 08 Mar 2021 08:21:57.234 # Connection with master lost.
10710:S 08 Mar 2021 08:21:57.234 * Caching the disconnected master state.
10710:S 08 Mar 2021 08:21:57.234 * Reconnecting to MASTER 127.0.0.1:10001
10710:S 08 Mar 2021 08:21:57.234 * MASTER <-> REPLICA sync started
10710:S 08 Mar 2021 08:21:57.235 # Error condition on socket for SYNC: Connection refused
10710:S 08 Mar 2021 08:21:57.879 * Connecting to MASTER 127.0.0.1:10001
10710:S 08 Mar 2021 08:21:57.879 * MASTER <-> REPLICA sync started
10710:S 08 Mar 2021 08:21:57.879 # Error condition on socket for SYNC: Connection refused
10710:S 08 Mar 2021 08:21:58.881 * Connecting to MASTER 127.0.0.1:10001
10710:S 08 Mar 2021 08:21:58.881 * MASTER <-> REPLICA sync started
10710:S 08 Mar 2021 08:21:58.881 # Error condition on socket for SYNC: Connection refused
10710:S 08 Mar 2021 08:21:58.887 * Connecting to MASTER 127.0.0.1:10002
10710:S 08 Mar 2021 08:21:58.887 * MASTER <-> REPLICA sync started
10710:S 08 Mar 2021 08:21:58.887 * REPLICAOF 127.0.0.1:10002 enabled (user request from 'id=7 addr=127.0.0.1:60272 laddr=127.0.0.1:10003 fd=12 name=sentinel-fd2b2b42-cmd age=4 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=331 qbuf-free=40623 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1')
10710:S 08 Mar 2021 08:21:58.888 # CONFIG REWRITE executed with success.
10710:S 08 Mar 2021 08:21:58.889 * Non blocking connect for SYNC fired the event.
10710:S 08 Mar 2021 08:21:58.889 * Master replied to PING, replication can continue...
10710:S 08 Mar 2021 08:21:58.890 * Trying a partial resynchronization (request 77a55ec4e95ace44d1d5396e43d2e3c5e396a116:426).
10710:S 08 Mar 2021 08:21:58.892 * Successful partial resynchronization with master.
10710:S 08 Mar 2021 08:21:58.892 # Master replication ID changed to 661bb3ee9eb04fc4a108a3089e7f8c1b7e8a2ffa
10710:S 08 Mar 2021 08:21:58.892 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10710:M 08 Mar 2021 08:21:59.633 # Connection with master lost.
10710:M 08 Mar 2021 08:21:59.633 * Caching the disconnected master state.
10710:M 08 Mar 2021 08:21:59.633 * Discarding previously cached master state.
10710:M 08 Mar 2021 08:21:59.633 # Setting secondary replication ID to 661bb3ee9eb04fc4a108a3089e7f8c1b7e8a2ffa, valid up to offset: 985. New replication ID is 217c983e6f43e2fe66b15672bf232a9ab760d433
10710:M 08 Mar 2021 08:21:59.633 * MASTER MODE enabled (user request from 'id=13 addr=127.0.0.1:60460 laddr=127.0.0.1:10003 fd=11 name=sentinel-82a03466-cmd age=1 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=188 qbuf-free=40766 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1')
10710:M 08 Mar 2021 08:21:59.635 # CONFIG REWRITE executed with success.
10710:M 08 Mar 2021 08:22:10.063 * Replica 127.0.0.1:10002 asks for synchronization
10710:M 08 Mar 2021 08:22:10.063 * Partial resynchronization not accepted: Requested offset for second ID was 2593, but I can reply up to 985
10710:M 08 Mar 2021 08:22:10.063 * Starting BGSAVE for SYNC with target: disk
10710:M 08 Mar 2021 08:22:10.063 * Background saving started by pid 10733
10733:C 08 Mar 2021 08:22:10.065 * DB saved on disk
10733:C 08 Mar 2021 08:22:10.066 * RDB: 0 MB of memory used by copy-on-write
10710:M 08 Mar 2021 08:22:10.110 * Background saving terminated with success
10710:M 08 Mar 2021 08:22:10.111 * Synchronization with replica 127.0.0.1:10002 succeeded

Comment From: yossigo

@monkey92t Looks like your down-after-milliseconds settings is very low (500ms) which triggers a false positive when the new master gets promoted. You may want to experiment with a larger value and see if you still experience this issue.

Comment From: monkey92t

After I set it to 2000 milliseconds, after repeated runs, the error did not reappear. I don’t know why there is an error when set to 500 milliseconds, but adding down-after-milliseconds can indeed eliminate the error. Can we close this issue?

Comment From: yossigo

500ms is just too short and results with a false positive. It could be a result of a short period of unresponsiveness from the instance side, or some delay that creeps in from Sentinel's end. If anyone has the bandwidth to look into that I guess this can be improved, although 500ms may be too short and result with other false positives even in other scenarios.

Comment From: monkey92t

OK, thanks for your help