redis6.2,cluster mode, 7nodes (1 master, 1 slave). when three master shutdown, a slave does not ascend to the maste and replica of another master. master log
1023:M 24 Apr 2023 17:18:16.004 # User requested shutdown...
1023:M 24 Apr 2023 17:18:16.004 * Saving the final RDB snapshot before exiting.
1022:C 24 Apr 2023 17:19:19.809 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1022:C 24 Apr 2023 17:19:19.809 # Redis version=6.2.7, bits=64, commit=44e6c570, modified=1, pid=1022, just started
1022:C 24 Apr 2023 17:19:19.809 # Configuration loaded
1022:C 24 Apr 2023 17:19:19.809 # This is opensource mode!
1022:C 24 Apr 2023 17:19:19.818 * Cipher root:/opt/redis/cipher
1022:M 24 Apr 2023 17:19:19.827 * monotonic clock: POSIX clock_gettime
1022:M 24 Apr 2023 17:19:19.828 * Node configuration loaded, I'm 70cee37c3e1e79c08365c110d479833150697d1b
1022:M 24 Apr 2023 17:19:19.829 # A key '__redis__compare_helper' was added to Lua globals which is not on the globals allow list nor listed on the deny list.
1022:M 24 Apr 2023 17:19:19.829 * Running mode=cluster, port=28800.
1022:M 24 Apr 2023 17:19:19.829 # Server initialized
1022:M 24 Apr 2023 17:19:19.830 * Loading RDB produced by version 6.2.7
1022:M 24 Apr 2023 17:19:19.830 * RDB age 307111 seconds
1022:M 24 Apr 2023 17:19:19.830 * RDB memory usage when created 16204.96 Mb
1022:M 24 Apr 2023 17:22:05.266 # Done loading RDB, keys loaded: 27433710, keys expired: 0.
1022:M 24 Apr 2023 17:22:05.267 * DB loaded from disk: 165.436 seconds
slave log
1015:M 24 Apr 2023 17:19:14.069 # Done loading RDB, keys loaded: 27433710, keys expired: 0.
1015:M 24 Apr 2023 17:19:14.069 * DB loaded from disk: 142.032 seconds
1015:M 24 Apr 2023 17:19:14.069 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1015:M 24 Apr 2023 17:19:14.069 * Ready to accept connections
1015:S 24 Apr 2023 17:19:14.071 * Discarding previously cached master state.
1015:S 24 Apr 2023 17:19:14.071 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1015:S 24 Apr 2023 17:19:14.072 * Connecting to MASTER 172.200.4.74:28800
1015:S 24 Apr 2023 17:19:14.072 * MASTER <-> REPLICA sync started
1015:S 24 Apr 2023 17:19:14.072 # Cluster state changed: ok
1015:S 24 Apr 2023 17:19:14.075 # Error condition on socket for SYNC: (null)
1015:S 24 Apr 2023 17:19:14.091 # Cluster state changed: fail
1015:S 24 Apr 2023 17:19:14.264 * Connecting to MASTER 172.200.7.201:28800
1015:S 24 Apr 2023 17:19:14.264 * MASTER <-> REPLICA sync started
1015:S 24 Apr 2023 17:19:14.518 * Non blocking connect for SYNC fired the event.
1015:S 24 Apr 2023 17:19:14.518 * Master replied to PING, replication can continue...
Comment From: hwware
Could you provide more details about your network? How many master nodes and replica nodes? How to allocate these 7 nodes? Thanks
Comment From: polaris-alioth
sorry, I didn't describe it clearly enough
in the beginning, the distribution was like this.
the network is normal. after three master and tree replica shutdown together(manually executed), one master has 2 slave and one master doesn't have slave. scaning slave logs, i find that the slave change the master and does not ascend to the maste when its master shutdown.
1015:S 24 Apr 2023 17:19:14.071 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1015:S 24 Apr 2023 17:19:14.072 * Connecting to MASTER 172.200.4.74:28800
1015:S 24 Apr 2023 17:19:14.072 * MASTER <-> REPLICA sync started
1015:S 24 Apr 2023 17:19:14.072 # Cluster state changed: ok
1015:S 24 Apr 2023 17:19:14.075 # Error condition on socket for SYNC: (null)
1015:S 24 Apr 2023 17:19:14.091 # Cluster state changed: fail
1015:S 24 Apr 2023 17:19:14.264 * Connecting to MASTER 172.200.7.201:28800
1015:S 24 Apr 2023 17:19:14.264 * MASTER <-> REPLICA sync started
I'd like to know why this might happen, thanks
Comment From: hwware
Sorry, @polaris-alioth i think we still need more information from your side. Such as: 1. which 3 of the master nodes are shutdown manully? 2. which 3 of the replica nodes are shutdown manully?
If I understand correctly, you shutdown manully 3 master nodes and 3 replica nodes, then the number of the remaining nodes is 8 And ideally, becuase 3 master nodes are shutdown, thus 3 replica nodes should be promoted to 3 master nodes, so the correct result should be 7 master nodes and 1 replica node remaining, right?
Please confirm, Thanks