After three consecutive failover processes, all redis nodes have become slaves. We observed this in production, and the detailed description is as follows: 1. epoch 38 1.1 Master Redis_1 fails.+odwn Redis_1 1.2 Sentinel_1 gets majority (Sentinel_2 and Sentinel_3 vote to sentinel_1) and performs the failover(epoch 38). 1.3 Slave Redis_2 is promoted success by Sentinel_1. 1.4 Sentinel_2 and Sentinel_3 get the new epoch, and the master switch event. 1.5 Redis_3 connects to Redis_2, and MASTER <-> SLAVE sync started.

  1. epoch 39 2.1 Redis_3 MASTER <-> SLAVE not finished. 2.2 +Odown Redis_2 2.3 Sentinel_3 gets majority (Sentinel_2 vote to sentinel_3, but Sentinel_1 not) and performs the failover for Redis_2(epoch 39). 2.4 Redis_1 is promoted success by Sentinel_3. 2.5 Sentinel_2 gets the new epoch, and the master switch event. Sentinel_1 only gets the new epoch. 2.6 Redis_3 connects to Redis_1, and MASTER <-> SLAVE sync started.

  2. epoch 40 3.1 Redis_3 MASTER <-> SLAVE not finished. 3.2 +Odown Redis_1 3.3 Sentinel_2 gets majority (Sentinel_1 vote to sentinel_2 but, Sentinel_3 not) and performs the failover for Redis_1 (epoch 40). 3.4 Redis_2 is promoted success by Sentinel_3. 3.5 Sentinel_1 gets the new epoch, and the master switch event. Sentinel_3 only gets the new epoch. 3.6 Redis_3 connects to Redis_2, and requests sync. Something interesting happened: 3.7 Sentinel_3 shows failover-end for Redis_2, performs the switch-master even, and sends slaveof command to Redis_2 3.8 Sentinel_2 shows failover-end for Redis_1, performs the switch-master even, and sends slaveof command to Redis_1 Now, all redis nodes become slaves. Is it a bug? How to avoid it?

Detailed log:

Sentinel_2_log 12297:X 10 Apr 10:00:10.203 # +sdown master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:10.258 # +sdown master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:10.349 # +new-epoch 38
Sentinel_1_log 2289:X 10 Apr 10:00:10.349 # +odown master mymaster __.__.__.88 6379 #quorum 2/2
Sentinel_1_log 2289:X 10 Apr 10:00:10.349 # +try-failover master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:10.357 # +vote-for-leader 72519055b95c9c69a56f2681e053bcd28b21df44 38
Sentinel_2_log 12297:X 10 Apr 10:00:10.362 # +new-epoch 38
Sentinel_1_log 2289:X 10 Apr 10:00:10.367 # __.__.__.89:26379 voted for 72519055b95c9c69a56f2681e053bcd28b21df44 38
Sentinel_2_log 12297:X 10 Apr 10:00:10.367 # +vote-for-leader 72519055b95c9c69a56f2681e053bcd28b21df44 38
Sentinel_3_log 12299:X 10 Apr 10:00:10.372 # +new-epoch 38
Sentinel_3_log 12299:X 10 Apr 10:00:10.374 # +vote-for-leader 72519055b95c9c69a56f2681e053bcd28b21df44 38
Sentinel_1_log 2289:X 10 Apr 10:00:10.375 # __.__.__.90:26379 voted for 72519055b95c9c69a56f2681e053bcd28b21df44 38
Sentinel_1_log 2289:X 10 Apr 10:00:10.415 # +elected-leader master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:10.415 # +failover-state-select-slave master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:10.487 # +selected-slave slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:10.487 * +failover-state-send-slaveof-noone slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:10.549 * +failover-state-wait-promotion slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Redis_2_log 12298:M 10 Apr 10:00:10.550 # Connection with master lost.
Redis_2_log 12298:M 10 Apr 10:00:10.550 * Caching the disconnected master state.
Redis_2_log 12298:M 10 Apr 10:00:10.550 * Discarding previously cached master state.
Redis_2_log 12298:M 10 Apr 10:00:10.550 * MASTER MODE enabled (user request from 'id=76574452 addr=__.__.__.88:51104 fd=237 name= age=271621 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec')
Redis_2_log 12298:M 10 Apr 10:00:10.551 # CONFIG REWRITE executed with success.
Redis_1_log 2290:M 10 Apr 10:00:11.175 # Connection with slave __.__.__.89:6379 lost.
Sentinel_1_log 2289:X 10 Apr 10:00:11.194 # -odown master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:11.194 # -sdown master mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:11.235 # -sdown master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:11.371 # +failover-state-reconf-slaves master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:11.371 # +promoted-slave slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:11.426 * +slave-reconf-sent slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.88 6379
Redis_3_log 12300:S 10 Apr 10:00:11.427 # Connection with master lost.
Redis_3_log 12300:S 10 Apr 10:00:11.427 * Caching the disconnected master state.
Sentinel_2_log 12297:X 10 Apr 10:00:11.427 # +config-update-from sentinel __.__.__.88:26379 __.__.__.88 26379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:11.427 # +switch-master mymaster __.__.__.88 6379 __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:11.427 # +config-update-from sentinel __.__.__.88:26379 __.__.__.88 26379 @ mymaster __.__.__.88 6379
Sentinel_3_log 12299:X 10 Apr 10:00:11.427 # +switch-master mymaster __.__.__.88 6379 __.__.__.89 6379
Redis_1_log 2290:M 10 Apr 10:00:11.428 # Connection with slave client id #36643089 lost.
Redis_3_log 12300:S 10 Apr 10:00:11.428 * Discarding previously cached master state.
Redis_3_log 12300:S 10 Apr 10:00:11.428 * SLAVE OF __.__.__.89:6379 enabled (user request from 'id=1545686 addr=__.__.__.88:54346 fd=232 name= age=271643 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=139 qbuf-free=32629 obl=36 oll=0 omem=0 events=rw cmd=exec')
Sentinel_2_log 12297:X 10 Apr 10:00:11.428 * +slave slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_2_log 12297:X 10 Apr 10:00:11.428 * +slave slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.89 6379
Redis_3_log 12300:S 10 Apr 10:00:11.429 # CONFIG REWRITE executed with success.
Sentinel_3_log 12299:X 10 Apr 10:00:11.429 * +slave slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:11.429 * +slave slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.89 6379
Redis_3_log 12300:S 10 Apr 10:00:11.784 * Connecting to MASTER __.__.__.89:6379
Redis_3_log 12300:S 10 Apr 10:00:11.785 * MASTER <-> SLAVE sync started
Redis_3_log 12300:S 10 Apr 10:00:11.785 * Non blocking connect for SYNC fired the event.
Redis_3_log 12300:S 10 Apr 10:00:11.786 * Master replied to PING, replication can continue...
Redis_2_log 12298:M 10 Apr 10:00:11.787 * Slave __.__.__.90:6379 asks for synchronization
Redis_3_log 12300:S 10 Apr 10:00:11.787 * Partial resynchronization not possible (no cached master)
Redis_2_log 12298:M 10 Apr 10:00:11.788 * Full resync requested by slave __.__.__.90:6379
Redis_2_log 12298:M 10 Apr 10:00:11.788 * Starting BGSAVE for SYNC with target: disk
Redis_2_log 12298:M 10 Apr 10:00:12.378 * Background saving started by pid 13998
Redis_3_log 12300:S 10 Apr 10:00:12.378 * Full resync from master: cf08385d9366918204a0d711c3249008d6050a94:6113519317747
Sentinel_1_log 2289:X 10 Apr 10:00:12.398 * +slave-reconf-inprog slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:15.600 # +sdown master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:15.648 # +sdown master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:15.732 # +new-epoch 39
Sentinel_3_log 12299:X 10 Apr 10:00:15.732 # +odown master mymaster __.__.__.89 6379 #quorum 2/2
Sentinel_3_log 12299:X 10 Apr 10:00:15.732 # +try-failover master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:15.737 # +vote-for-leader 8f689180a77c7826d46bef6aa501fbd9a415c9f8 39
Sentinel_2_log 12297:X 10 Apr 10:00:15.742 # +new-epoch 39
Sentinel_2_log 12297:X 10 Apr 10:00:15.745 # +vote-for-leader 8f689180a77c7826d46bef6aa501fbd9a415c9f8 39
Sentinel_3_log 12299:X 10 Apr 10:00:15.746 # __.__.__.89:26379 voted for 8f689180a77c7826d46bef6aa501fbd9a415c9f8 39
Sentinel_3_log 12299:X 10 Apr 10:00:15.792 # +elected-leader master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:15.793 # +failover-state-select-slave master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:15.855 # +selected-slave slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:15.855 * +failover-state-send-slaveof-noone slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:15.946 * +failover-state-wait-promotion slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Redis_1_log 2290:M 10 Apr 10:00:15.947 # CONFIG REWRITE executed with success.
Sentinel_1_log 2289:X 10 Apr 10:00:16.249 # +new-epoch 39
Sentinel_2_log 12297:X 10 Apr 10:00:16.329 # -sdown master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:16.366 # -odown master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:16.366 # -sdown master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:16.747 # +promoted-slave slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:16.748 # +failover-state-reconf-slaves master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:16.814 * +slave-reconf-sent slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.89 6379
Redis_2_log 12298:M 10 Apr 10:00:16.815 # Connection with slave __.__.__.90:6379 lost.
Sentinel_2_log 12297:X 10 Apr 10:00:16.815 # +config-update-from sentinel __.__.__.90:26379 __.__.__.90 26379 @ mymaster __.__.__.89 6379
Sentinel_2_log 12297:X 10 Apr 10:00:16.815 # +switch-master mymaster __.__.__.89 6379 __.__.__.88 6379
Redis_3_log 12300:S 10 Apr 10:00:16.816 # CONFIG REWRITE executed with success.
Redis_3_log 12300:S 10 Apr 10:00:16.816 * SLAVE OF __.__.__.88:6379 enabled (user request from 'id=1550626 addr=__.__.__.90:43288 fd=202 name= age=5 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=230 qbuf-free=32538 obl=81 oll=0 omem=0 events=rw cmd=slaveof')
Sentinel_2_log 12297:X 10 Apr 10:00:16.817 * +slave slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:16.817 * +slave slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.88 6379
Sentinel_3_log 12299:X 10 Apr 10:00:17.757 * +slave-reconf-inprog slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.89 6379
Redis_3_log 12300:S 10 Apr 10:00:17.814 * Connecting to MASTER __.__.__.88:6379
Redis_3_log 12300:S 10 Apr 10:00:17.815 * MASTER <-> SLAVE sync started
Redis_3_log 12300:S 10 Apr 10:00:17.815 * Non blocking connect for SYNC fired the event.
Redis_3_log 12300:S 10 Apr 10:00:17.816 * Master replied to PING, replication can continue...
Redis_3_log 12300:S 10 Apr 10:00:17.817 * Partial resynchronization not possible (no cached master)
Redis_1_log 2290:M 10 Apr 10:00:17.818 * Full resync requested by slave __.__.__.90:6379
Redis_1_log 2290:M 10 Apr 10:00:17.818 * Slave __.__.__.90:6379 asks for synchronization
Redis_1_log 2290:M 10 Apr 10:00:17.818 * Starting BGSAVE for SYNC with target: disk
Redis_1_log 2290:M 10 Apr 10:00:17.831 * Background saving started by pid 15870
Redis_3_log 12300:S 10 Apr 10:00:17.832 * Full resync from master: fe49d64269db100eae17d43470e088c085585e6e:6113520939227
Sentinel_2_log 12297:X 10 Apr 10:00:20.993 # +sdown master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:21.219 # +sdown master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:21.277 # +odown master mymaster __.__.__.88 6379 #quorum 2/2
Sentinel_3_log 12299:X 10 Apr 10:00:21.869 # +sdown slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_2_log 12297:X 10 Apr 10:00:22.093 # +new-epoch 40
Sentinel_2_log 12297:X 10 Apr 10:00:22.093 # +odown master mymaster __.__.__.88 6379 #quorum 2/2
Sentinel_2_log 12297:X 10 Apr 10:00:22.094 # +try-failover master mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:22.098 # +vote-for-leader c72f1d6c16783505969cabeb14a9a2b870a6451a 40
Sentinel_1_log 2289:X 10 Apr 10:00:22.110 # +new-epoch 40
Sentinel_1_log 2289:X 10 Apr 10:00:22.123 # +vote-for-leader c72f1d6c16783505969cabeb14a9a2b870a6451a 40
Sentinel_2_log 12297:X 10 Apr 10:00:22.124 # __.__.__.88:26379 voted for c72f1d6c16783505969cabeb14a9a2b870a6451a 40
Sentinel_2_log 12297:X 10 Apr 10:00:22.156 # +elected-leader master mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:22.156 # +failover-state-select-slave master mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:22.227 # +selected-slave slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:22.228 * +failover-state-send-slaveof-noone slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:22.273 # __.__.__.89:26379 voted for c72f1d6c16783505969cabeb14a9a2b870a6451a 40
Sentinel_2_log 12297:X 10 Apr 10:00:22.318 * +failover-state-wait-promotion slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Redis_2_log 12298:M 10 Apr 10:00:22.320 # CONFIG REWRITE executed with success.
Sentinel_3_log 12299:X 10 Apr 10:00:22.714 # +new-epoch 40
Sentinel_2_log 12297:X 10 Apr 10:00:23.126 # +failover-state-reconf-slaves master mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:23.126 # +promoted-slave slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:23.194 * +slave-reconf-sent slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:23.195 # +config-update-from sentinel __.__.__.89:26379 __.__.__.89 26379 @ mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:23.195 # +switch-master mymaster __.__.__.88 6379 __.__.__.89 6379
Redis_3_log 12300:S 10 Apr 10:00:23.196 * SLAVE OF __.__.__.89:6379 enabled (user request from 'id=1550631 addr=__.__.__.89:49520 fd=104 name= age=7 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=230 qbuf-free=32538 obl=81 oll=0 omem=0 events=rw cmd=slaveof')
Redis_3_log 12300:S 10 Apr 10:00:23.197 # CONFIG REWRITE executed with success.
Sentinel_1_log 2289:X 10 Apr 10:00:23.262 * +slave slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_1_log 2289:X 10 Apr 10:00:23.262 * +slave slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.89 6379
Sentinel_2_log 12297:X 10 Apr 10:00:23.334 # -odown master mymaster __.__.__.88 6379
Redis_3_log 12300:S 10 Apr 10:00:23.839 * Connecting to MASTER __.__.__.89:6379
Redis_3_log 12300:S 10 Apr 10:00:23.840 * MASTER <-> SLAVE sync started
Redis_3_log 12300:S 10 Apr 10:00:23.840 * Master replied to PING, replication can continue...
Redis_3_log 12300:S 10 Apr 10:00:23.840 * Non blocking connect for SYNC fired the event.
Redis_3_log 12300:S 10 Apr 10:00:23.841 * Partial resynchronization not possible (no cached master)
Redis_2_log 12298:M 10 Apr 10:00:23.842 * Full resync requested by slave __.__.__.90:6379
Redis_2_log 12298:M 10 Apr 10:00:23.842 * Slave __.__.__.90:6379 asks for synchronization
Redis_2_log 12298:M 10 Apr 10:00:23.842 * Waiting for next BGSAVE for SYNC
Sentinel_2_log 12297:X 10 Apr 10:00:24.165 * +slave-reconf-inprog slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.88 6379
Sentinel_3_log 12299:X 10 Apr 10:00:25.688 # +sdown master mymaster __.__.__.89 6379
Sentinel_1_log 2289:X 10 Apr 10:00:26.414 # +sdown slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:26.696 # -sdown master mymaster __.__.__.89 6379
Redis_1_log 2290:M 10 Apr 10:00:31.195 # Connection with slave __.__.__.90:6379 lost.
Sentinel_2_log 12297:X 10 Apr 10:00:31.428 # -sdown master mymaster __.__.__.88 6379
Sentinel_1_log 2289:X 10 Apr 10:00:31.550 # -sdown slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.590 # -sdown slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.814 # +failover-end master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.814 # +failover-end-for-timeout master mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.814 # +switch-master mymaster __.__.__.89 6379 __.__.__.88 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.814 * +slave-reconf-sent-be slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.814 * +slave-reconf-sent-be slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.89 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.816 * +slave slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_3_log 12299:X 10 Apr 10:00:31.816 * +slave slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:38.164 # +failover-end-for-timeout master mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:38.165 # +failover-end master mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:38.165 # +switch-master mymaster __.__.__.88 6379 __.__.__.89 6379
Sentinel_2_log 12297:X 10 Apr 10:00:38.165 * +slave-reconf-sent-be slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:38.165 * +slave-reconf-sent-be slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.88 6379
Sentinel_2_log 12297:X 10 Apr 10:00:38.166 * +slave slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Sentinel_2_log 12297:X 10 Apr 10:00:38.166 * +slave slave __.__.__.90:6379 __.__.__.90 6379 @ mymaster __.__.__.89 6379
Sentinel_1_log 2289:X 10 Apr 10:00:41.617 * +convert-to-slave slave __.__.__.88:6379 __.__.__.88 6379 @ mymaster __.__.__.89 6379
Redis_1_log 2290:S 10 Apr 10:00:41.618 * SLAVE OF __.__.__.89:6379 enabled (user request from 'id=38329939 addr=__.__.__.88:59940 fd=737 name=sentinel-72519055-cmd age=24 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec')
Redis_1_log 2290:S 10 Apr 10:00:41.621 # CONFIG REWRITE executed with success.
Sentinel_3_log 12299:X 10 Apr 10:00:41.950 * +convert-to-slave slave __.__.__.89:6379 __.__.__.89 6379 @ mymaster __.__.__.88 6379
Redis_2_log 12298:S 10 Apr 10:00:41.951 # Connection with slave __.__.__.90:6379 lost.
Redis_2_log 12298:S 10 Apr 10:00:41.951 * SLAVE OF __.__.__.88:6379 enabled (user request from 'id=76582244 addr=__.__.__.90:41834 fd=8 name=sentinel-8f689180-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec')
Redis_2_log 12298:S 10 Apr 10:00:41.952 # CONFIG REWRITE executed with success.
Redis_3_log 12300:S 10 Apr 10:00:41.952 # Unexpected reply to PSYNC from master: -Reading from master: Resource temporarily unavailable
Redis_3_log 12300:S 10 Apr 10:00:41.952 * Retrying with SYNC...
Redis_3_log 12300:S 10 Apr 10:00:41.953 # I/O error reading bulk count from MASTER: Resource temporarily unavailable
Redis_1_log 2290:S 10 Apr 10:00:42.066 * Connecting to MASTER __.__.__.89:6379
Redis_1_log 2290:S 10 Apr 10:00:42.066 * MASTER <-> SLAVE sync started
Redis_1_log 2290:S 10 Apr 10:00:42.067 * Master replied to PING, replication can continue...
Redis_1_log 2290:S 10 Apr 10:00:42.067 * Non blocking connect for SYNC fired the event.
Redis_1_log 2290:S 10 Apr 10:00:42.070 # Unexpected reply to PSYNC from master: -MASTERDOWN Link with MASTER is down and slave-serve-stale-data is set to 'no'.
Redis_1_log 2290:S 10 Apr 10:00:42.070 * Partial resynchronization not possible (no cached master)
Redis_1_log 2290:S 10 Apr 10:00:42.070 * Retrying with SYNC...

Comment From: tinawenqiao

A similar scenario was mentioned in issue 2370, but no conclusion was made. Hope @antirez can explain more about this extreme case.

Comment From: tinawenqiao

This problem occurs in version 3.0.6. Does it fix in the new version? @antirez

Comment From: antirez

Hello, Redis 3.0.6 was released more than 5 years ago. I can confirm that we fixed many issues that could lead to severe problems. Please upgrade.

Comment From: tinawenqiao

@antirez But we don't think upgrading will solve this problem. The root cause of this problem is that the master pointed by sentinel is inconsistent at the same time.

Comment From: myl1024

We observed that during a failover, at some moment, the master of slave sentienls pointed is leader sentinel`s promoted_slaved. Our failover failed at the moment and caused such a result.

Comment From: antirez

@tinawenqiao I don't mean you should upgrade just Redis itself, also Redis Sentinel.

Comment From: antirez

I remember that we fixed this problem a long time ago. Just try to download the latest version of Redis, and re-test the scenario above. You should not see any of the above problem. If you still can see it, I will do an investigation in order to understand what's happening.

Comment From: tinawenqiao

@antirez Do you remember the issue keyword to fix this problem? We want to know how it was fixed.

Comment From: antirez

@tinawenqiao I don't remember, but a saner way to handle that is the following:

  1. Simulate your environment with Redis 6.
  2. Try to see the bug happening again.
  3. Report it with Redis 6.
  4. I'll follow up with a serious investigation.

Redis 3 is no longer supported, we can't investigate bugs against so old versions.

Comment From: tinawenqiao

@antirez Thank you for your suggestion. We plan to reproduce this problem on Redis 6. But there is a problem in deploying Redis 6 on k8s. To solve this problem, my colleague pulled a request https://github.com/antirez/redis/pull/7393. Please review this PR for us.