Describe the bug
Redis 5.0.14 running in k8s, with one slave node(pod0) and one master node(pod1), and three sentinels monitor the master node.
when I check info replication and role in both pod0 and pod1, it shows itself a slave node, but it slaves of an invalide master node, with master_link_status up;
But when I check sentinel master <master_name> and sentinel slaves <master_name> in sentinel , sentinel says pod1 is the master node, and pod0 is the slave node and the invalid node, also is a slave node with status up
Here is the log:
[root@k8s-master1]# kubectl get pod -owide
NAME. READY STATUS RESTARTS AGE IP
pod0 2/2 RUNNING 0 142m 172.29.24.160
pod1 2/2 RUNNING 0 5d3h 172.29.54.166
sentinel0 2/2 RUNNING 0 5d3h 172.29.76.170
sentinel1 2/2 RUNNING 0 5d3h 172.29.49.217
sentinel2 2/2 RUNNING 0 5d3h 172.29.232.91
# check `info replication` and `role` in pod0
[root@k8s-master1]# redis-cli -h 172.29.24.160
>info replication
# Replication
role: slave
master_host: 172.29.76.175 (the invalide node)
master_link_status: up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:103228379
slave_read_only:1
connected_slaves:0
> role
1) "slave"
2)"172.29.76.175" (the invalide node)
3)(integer) 6379
4)"connected"
5)(integer) 103231897
# check `info replication` and `role` in pod1
[root@k8s-master1]# redis-cli -h 172.29.54.166
>info replication
# Replication
role: slave
master_host: 172.29.76.175 (the invalide node)
master_link_status: up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:103246243
slave_read_only:1
connected_slaves:0
> role
1) "slave"
2)"172.29.76.175" (the invalide node)
3)(integer) 6379
4)"connected"
5)(integer) 103231897
# check `sentinel master <master_name>` and `sentinel slaves <master_name>` in sentinel2(the same output in sentinel1)
[root@k8s-master1]# redis-cli -h 172.29.232.91 -p 26379
>sentinel master mymaster
1) "name"
2)"mymaster"
3)"ip"
4)"172.29.54.166" (pod1)
5)"port"
6)"6379"
9)"flags"
10)"master"
29) "config-epoch"
30) "16"
31) "num-slaves"
32) "1"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
> sentinel slaves mymaster
1) 1) "name"
2) "172.29.76.175:6379" (the invalide node)
3) "ip"
4) "172.29.76.175"
5) "port"
6) "6379"
9) "flags"
10) "slave"
11) "link-pending-commands"
12) "-2"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "537"
19) "last-ping-reply"
20) "537"
21) "down-after-milliseconds"
22) "15000"
23) "info-refresh"
24) "3367"
25) "role-reported"
26) "slave"
27) "role-reported-time"
28) "1124513"
29) "master-link-down-time"
30) "0"
31) "master-link-status"
32) "ok"
33) "master-host"
34) "172.29.54.166" (pod1)
35) "master-port"
36) "6379"
37) "slave-priority"
38) "100"
39) "slave-repl-offset"
40) "118674508"
2) 1) "name"
2) "172.29.24.160:6379" (pod0)
3) "ip"
4) "172.29.24.160"
5) "port"
6) "6379"
9) "flags"
10) "slave"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "39"
19) "last-ping-reply"
20) "39"
21) "down-after-milliseconds"
22) "15000"
23) "info-refresh"
24) "2293"
25) "role-reported"
26) "slave"
27) "role-reported-time"
28) "1207740"
29) "master-link-down-time"
30) "0"
31) "master-link-status"
32) "ok"
33) "master-host"
34) "172.29.54.166" (pod1)
35) "master-port"
36) "6379"
37) "slave-priority"
38) "100"
39) "slave-repl-offset"
40) "118674824"
To reproduce
I don't know how to reproduce it, and why it occurred
Comment From: yongman
It's related to the cni of kubernetes. With some cni plugins, getpeername() can not work correctly which way redis use to get the peer ip address. The result of calling getpeername() may be the address of the host that the peer pod located in.
Comment From: fengyinqiao
After careful investigation, the cause has been found out. After the pod0 is deleted, the container remains on the k8s node, and the redis process in the container is still alive. And it has a different IP address from the renewed pod.