Redis Redis cluster returns IP address of master which not exist

I`m using redis cluster inside kubernetes cluster and I notice that Redis returns incorrect ip addresses (there are basically not exist)

redis-cli  CLUSTER SLOTS
0
16383
10.244.3.10
6379
8eaec21af0af82c56a85422666278d2dabf320a8
10.244.11.11
6379
bd5b63ec39d7b4eef98b820227e96b5d331612d6
10.244.37.11
6379
bc8bd7a7c9f87ef616f1471cfa1721762faba5ad

redis-cli  CLUSTER NODES

bc8bd7a7c9f87ef616f1471cfa1721762faba5ad 10.244.37.11:6379@16379 slave 8eaec21af0af82c56a85422666278d2dabf320a8 0 1504621288802 25 connected
8eaec21af0af82c56a85422666278d2dabf320a8 10.244.3.10:6379@16379 myself,master - 0 1504621289000 25 connected 0-16383
bd5b63ec39d7b4eef98b820227e96b5d331612d6 10.244.11.11:6379@16379 slave 8eaec21af0af82c56a85422666278d2dabf320a8 0 1504621289805 25 connected

Both command returns ip for master 10.244.3.10 but the problem is that following address not exist in our environment. Instead of following should be used: 10.244.31.4 (also this address is visible when for ifconfig output on master machine)

When I make redis ping for all host in my cluster I get following response for ping :

bash-4.3# redis-cli -h 10.244.3.10 -p 6379 ping
Could not connect to Redis at 10.244.3.10:6379: Host is unreachable


bash-4.3# redis-cli -h 10.244.11.11 -p 6379 ping
PONG
bash-4.3# redis-cli -h 10.244.37.11 -p 6379 ping
PONG
But I can ping 10.244.31.4 
bash-4.3# redis-cli -h 10.244.31.4 -p 6379 ping                                                                                                                                                                   
PONG

Why redis return incorrect ip value for master node ? Regards, Piotr

Comment From: me115

Are you run redis-server in docker container? check if 10.244.3.10 is gateway address?

Comment From: chmielas

No, I`m running redis in kubernetes cluster and 10.244.3.10 is not the address of any of the endpoints in cluster

Comment From: leonth

We experienced similar issues as well. We are running redis cluster in kubernetes. Our hypothesis is that the IP returned was actually the IP address of the previous pod (analogous to instance/container). In kubernetes, when a pod dies and restarts, it is given a new internal IP address. A CLUSTER MEET should make the pod join the cluster again with thr new IP, but we suspect that the old IP address might get stuck somewhere.

Comment From: gagabu

I'm also running redis cluster in kubernetes and have met with same problem. As I understood, redis node returns ip address that pod had on time of cluster creation and stored at nodes.conf file.

My workaround is to add init container to redis statefulset yaml and replace old IP to new one.

      initContainers:
        - name: update-pod-ip
          image: busybox
          env:
            - name: MY_POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          command: ['sh', '-c', 'if [ -s /data/nodes.conf ]; then sed -ri "/myself/s/[0-9]{1,3}\.[0-9]{1,3}.[0-9]{1,3}\.[0-9]{1,3}/$MY_POD_IP/" /data/nodes.conf; fi']
          volumeMounts:
            - name: data
              mountPath: /data
              readOnly: false

Comment From: leonth

We've successfully run redis cluster in kubernetes production since Jan 2018 using redis 4 and --cluster-announce-ip. k8s yaml looks roughly like this:

      containers:
      - name: redis
        image: redis:4.0.6
        command: ["redis-server"]
        args:
        - /etc/redis/redis.conf
        - --cluster-announce-ip
        - "$(MY_POD_IP)"
        env:
        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP

Comment From: MQPearth

However, when the entire k8s cluster is restarted, the IP addresses of all redis pods have changed, and at this time the redis cluster cannot be restored.

Comment From: MQPearth

This is my solution, specifying cluster-announce-ip but not using a specific IP.

After the entire k8s restarts, all pods will be assigned new IPs, but the Redis cluster remains valid.

containers:
- name: redis
  image: redis:6.0.19
  imagePullPolicy: IfNotPresent
  command: 
  - "redis-server"
  args:
  - "/etc/redis/redis.conf"
  - "--cluster-announce-ip"
  - "$(POD_NAME).$(POD_SERVICE_NAME).$(POD_NAMESPACE).svc.cluster.local"
  env:
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: POD_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  - name: POD_SERVICE_NAME
    value: "redis"

My Redis image version is: redis:6.0.19 Kubernetes version is: 1.23.1 Note that this configuration method has not been rigorously tested.

Comment From: kjoe

This is my solution, specifying cluster-announce-ip but not using a specific IP.

After the entire k8s restarts, all pods will be assigned new IPs, but the Redis cluster remains valid.

Thanks! It's a little bit tricky configuration, but definitely works, if we take care of its edge cases. I'm only able to use hostnames, if I follow these steps, and rules:

Before redis-cli v7.0 CLUSTER MEET only supports IPs (see https://github.com/redis/redis/pull/10436), so first time we need to initialize the cluster with IP addresses.
After this we need to add hostnames with --cluster-announce-ip parameter, the redeploy will restart the nodes one-by-one, during this each node replaces its IP address to hostname in CLUSTER NODES list, and finally the cluster becomes stable state.
If we use longer pod/service/namespace names, then we easily hit the maximum name length, because the node address field is limited to 46 chars, so the hostname gets truncated, and the cluster will be stucked in failed state. https://github.com/redis/redis/blob/de0d9632b52849d9b7ea52408b1d681d771c5b46/src/server.h#L119
If we are unable to shorten the names, then we can use them without domain names, or actually we can ommit the namespace part too, because the k8s DNS resolver will add the missing parts thru DNS search domains.
If we need to connect to the Redis Cluster from another namespace with shortened hostnames, then we must add the cluster's namespace to the client's local search domains with spec.dnsConfig.searches, because the MOVED redirect will return hostnames rather than IP addresses to the client.
Note that the client will unable to guess a missing pod's IP address when it's restarting, so sometimes it could get DNS resolving error (depending on implementation).

Comment From: kjoe

We experienced similar issues as well. We are running redis cluster in kubernetes. Our hypothesis is that the IP returned was actually the IP address of the previous pod (analogous to instance/container). In kubernetes, when a pod dies and restarts, it is given a new internal IP address. A CLUSTER MEET should make the pod join the cluster again with thr new IP, but we suspect that the old IP address might get stuck somewhere.

I can confirm it seems to a bug under Kubernetes with Redis Cluster, and --cluster-announce-ip magically fixes this phenomenom, however the cluster will survive single node restarts without this parameter (and without CLUSTER MEET) too. My observation is that the wrong (previous) IP address appears only from the node point of view itself (in the "myself" row, regardles if it's a master, or a slave) in CLUSTER NODES and SLOTS lists, but other nodes knows its correct (new) IP address:

redis-service-1 (10.12.1.205):

8325f28d789cdc187e3f7c0b7c7d7c285f65764c 10.12.1.125:6379@16379 myself,master - 0 1710753141000 10 connected 0-5460
74e6d24c9adbf356256e0a77f94ef600e9a4f29f 10.12.1.208:6379@16379 slave 7b05d07f483a7d227ba84fc65608c7066364a3da 0 1710753143088 7 connected
294e4eeac0de01d9816177334599330aba448679 10.12.2.170:6379@16379 master - 0 1710753142000 8 connected 5461-10922
fcca7847570ae5b36958105475a84e3240aac3da 10.12.2.173:6379@16379 slave 294e4eeac0de01d9816177334599330aba448679 0 1710753142085 8 connected
0ad7a643c62e678ee314407dc1c0abb80b59fdbf 10.12.7.87:6379@16379 slave 8325f28d789cdc187e3f7c0b7c7d7c285f65764c 0 1710753141000 10 connected
7b05d07f483a7d227ba84fc65608c7066364a3da 10.12.7.95:6379@16379 master - 0 1710753144092 7 connected 10923-16383


redis-service-2 (10.12.2.173):

8325f28d789cdc187e3f7c0b7c7d7c285f65764c 10.12.1.205:6379@16379 master - 0 1710753117095 10 connected 0-5460
74e6d24c9adbf356256e0a77f94ef600e9a4f29f 10.12.1.208:6379@16379 slave 7b05d07f483a7d227ba84fc65608c7066364a3da 0 1710753119100 7 connected
294e4eeac0de01d9816177334599330aba448679 10.12.2.170:6379@16379 master - 0 1710753116092 8 connected 5461-10922
fcca7847570ae5b36958105475a84e3240aac3da 10.12.2.40:6379@16379 myself,slave 294e4eeac0de01d9816177334599330aba448679 0 1710753117000 2 connected
0ad7a643c62e678ee314407dc1c0abb80b59fdbf 10.12.7.87:6379@16379 slave 8325f28d789cdc187e3f7c0b7c7d7c285f65764c 0 1710753118098 10 connected
7b05d07f483a7d227ba84fc65608c7066364a3da 10.12.7.95:6379@16379 master - 0 1710753117000 7 connected 10923-16383

Comment From: zioproto

It seems the usage of --cluster-announce-ip is used also in the redis-cluster bitnami helm chart to solve this issue:

https://github.com/bitnami/charts/blob/c6d6b1735a0d364655e11cc669a95f862f24627f/bitnami/redis-cluster/templates/redis-statefulset.yaml#L115