Describe the bug

In the cluster mode, when the slave nodes in the sharded cluster are pinged by the master node during the execution of the nodeUpdateAddressIfNeeded operation, the getpeername system call may fail due to an error, causing the server.masterhost variable to be incorrectly set to ?. The slave node reports an error every 1 second: "Connecting to MASTER ?:6379". Just at that time, the master node and the slave node experience a network partition, and the master status of the sharded node is marked as PFAILED. At this time, other nodes will send gossip messages to the sharded slave node to correct the IP information of the sharded master node, but the server.masterhost configuration information will not be updated, which will cause the synchronization relationship between the master and slave nodes to not recover after the getpeername system call restores, and the following redis kernel-level error message will be displayed:

Redis [BUG]The master-slave synchronization relationship of cluster shards cannot be restored due to network failure

To reproduce

remark: - Redis kernel version: 6.2.14 - The redis parameter 'cluster-announce-ip' is not configured

  1. Create 3 primary and 3 replication redis clusters Redis [BUG]The master-slave synchronization relationship of cluster shards cannot be restored due to network failure

  2. To simulate the 'slave0' node system call 'getpeername' error, here in order to quickly simulate the error, the parameter overheat configuration method is directly modified to obtain the ip address as: '? ` Redis [BUG]The master-slave synchronization relationship of cluster shards cannot be restored due to network failure

  3. Simulated 'master0' and 'slave0' node network failures

#The slave0 node added iptables rules
iptables -A INPUT -s {master0-ip} -j DROP
iptables -A OUTPUT -d {master0-ip -j DROP
  1. Wait for 'server.cluster-node-timeout' time to restore 'slave0' node system call 'getpeername'

  2. Recover 'master0' and 'slave0' node network failures

iptables -D INPUT -s {master0-ip} -j DROP
iptables -D OUTPUT -d {master0-ip -j DROP

Expected behavior

The 'master' and 'slave' synchronization relationship can be restored after 'getpeername' system call and the network is restored

Additional information

  1. There is no way to simulate 'getpeername' system call exception, so by modifying the source code to simulate.

Redis [BUG]The master-slave synchronization relationship of cluster shards cannot be restored due to network failure

  1. This problem has occurred in our production environment.

  2. Adding the code in the red box below should fix the problem

Redis [BUG]The master-slave synchronization relationship of cluster shards cannot be restored due to network failure

Comment From: sundb

@wstar05 thanks, can you make a PR to fix it?

Comment From: wstar05

@wstar05 thanks, can you make a PR to fix it?

PR: https://github.com/redis/redis/pull/13514