in a K8s deployment, When the master redis pod goes down, the failover can happen normally. A slave will be selected as master by Sentinel. When the old master (a new pod) get back online with a different pod IP but the same dns name. the Sentinel will not refresh the IP based on the old master's dns name. so the old master will be isolated as the master has no info about the Sentinel cluster. Is there a way to make the Sentinel work on K8s

Comment From: mabrarov

Hi Redis Team!

It looks like Redis is targeted to perform communications using IP address rather than domain names (DNS), e.g. the most of Redis configuration options support domain names, but these names are replaced with IP addresses (resolved at the start of Redis, I guess), even in configuration files (which Redis rewrites).

I understand that TCP and UDP work with IP address and that IP address is more "precise" comparing to domain name, which can be resolved into multiple IP addresses and the set and order of resolved IPs can be different every time domain name is resolved. I understand that resolution of domain name requires resources (CPU, network traffic and time), but if Redis configuration options already support domain names then can't Redis just keep using domain names instead of replacing them with IPs? IPs work well in static environment (I understand that static environment is more friendly for availability and performance), but with containerization trends many environments where Redis runs are container orchestration managed environments like K8s and OpenShift. These environments utilize DNS for service discovery.

I understand downsides of DNS related to the DNS caching (which is often missing) and I know that K8s recommends using "native" API for service discovery which is time/latency sensitive, but the question is - is it really hard to stop Redis replacing domain names with IPs? I believe there is a chance that such changes are small and well isolated, i.e. easy to implement and introduce.

Just a side note: the fact that Redis doesn't really use domain names, but replaces them with IP addresses, which are resolved at the start, prevents my team from usage of Redis, because we have to use OpenShift (which is very close to K8s) for deployment of Redis. That's sad, because Redis feels like a nice solution covering almost all of the team's needs.

Thank you.

Comment From: mabrarov

This issue seems to be close to issue #2186 and issue #2075. Especially to that note.

Comment From: mabrarov

Hmm... it looks like --replicaof command line option, replicaof configuration option and REPLICAOF CLI command support domain name. Refer to anetTcpGenericConnect function (called by connectWithMaster function) which resolves name given in addr parameter using getaddrinfo function. The only remaining thing which is missing for my case (single master and multiple replica) is support of domain name in sentinel monitor configuration option (without replacement of given domain name with IP address).

Comment From: yossigo

Hey @jhuang13 and @mabrarov, can you please take a look at #8282 and see if this is going to help with that?

This will of course require some configuration: 1. Use SENTINEL MONITOR with hostnames and not IPs. 2. Enable the resolve-hostnames and announce-hostnames Sentinel config parameters. 3. Configure all Redis instances with replica-announce-ip and their hostname (including masters, as they may fail over).

This PR is still in progress but I'd be interested to get some confirmation it's in the right direction.

Comment From: jhuang13

Hi @yossigo, thanks. yes, look forward to the new release.

Comment From: mabrarov

@yossigo,

It looks like #8282 is outdated (comparing to Redis 6.0.9 version) and has a merge conflict (the reason I can't update #8282 myself). I built Redis from the source branch of #8282, but that build doesn't work with my OpenShift deployment even without configuration changes - Sentinel fails during the startup with:

*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 3
>>> 'sentinel monitor "master" "sss-redis-master-0.sss-redis-headless.v4-local-int.svc.cluster.local" "6379" "2"'
Can't resolve master instance hostname.

It looks like sentinel monitor configuration option in the source branch of #8282 doesn't support usage of domain name, like Redis 6.0.9 supports (i.e. this configuration works with Redis 6.0.9). Could you please update #8282 by merging changes from 6.0.9 release?

Thank you.

Comment From: yossigo

@mabrarov Thanks for taking a look, I've updated this PR with latest unstable now. Note that you'll need to configure resolve-hostnames and announce-hostnames to get it to handle host names as expected.

I look forward to get your feedback.

Comment From: mabrarov

@yossigo,

I still get the same error at the start of Sentinel.

Here is my sentinel.conf (part of the file):

dir "/data"
bind 0.0.0.0
sentinel monitor "master" "sss-redis-master-0.sss-redis.v4-local-int.svc.cluster.local" "6379" "2"
sentinel parallel-syncs "master" "2"
sentinel down-after-milliseconds "master" 5000
sentinel failover-timeout "master" 60000
sentinel resolve-hostnames yes
sentinel announce-hostnames yes

Comment From: yossigo

@mabrarov You need to have the sentinel resolve-hostnames and sentinel announce-hostnames lines before the sentinel monitor line (this is a known issue which is already being worked on in #8271).

Comment From: mabrarov

@yossigo,

I adjusted configuration as you suggested and it started successfully. I need some time for testing, but first naïve tests I executed manually demonstrate that #8282 works and fixes #8300.

Thank you for your help and for #8282! I'll provide more details once I test more cases next week.

Comment From: satheeshaGowda

Hello @yossigo , thank you for adding https://github.com/redis/redis/pull/8282 , this is the much awaited PR .

we have spent sometime testing that feature on redis:6.2-rc3 from dockerhub with the configuration suggested here https://github.com/redis/redis/pull/8282#issuecomment-776032896

at the outset Sentinel vends out the host name of the shard master when we query SENTINEL get-master-addr-by-name mymaster, but after we trigger a failover , it started vending out IPs again, isnt the expectation is to return host name?

here is the yaml spec we used, if it helps you reproduce .

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: redis
  replicas: 5
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      initContainers:
      - name: config
        image: redis:6.2-rc3
        command: [ "bash", "-c" ]
        args:
          - |
            cp /tmp/redis/redis.conf /etc/redis/redis.conf

            MASTER_FQDN=redis-0.redis.REDACTED
            POD_FQDN=$(hostname -f)
            echo "replica-announce-ip  $POD_FQDN" >> /etc/redis/redis.conf
            echo "replica-announce-port 6379" >> /etc/redis/redis.conf
            if [ "$POD_FQDN" =  "$MASTER_FQDN" ]; then
                echo "this is master, not updating config..."
            else
                echo "updating replica redis.conf..."
                echo "replicaof $MASTER_FQDN 6379" >> /etc/redis/redis.conf
            fi
            cat /etc/redis/redis.conf
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/
        - name: config
          mountPath: /tmp/redis/
      containers:
      - name: redis
        image: redis:6.2-rc3
        command: ["redis-server"]
        args: ["/etc/redis/redis.conf"]
        ports:
        - containerPort: 6379
          name: redis
        volumeMounts:
        - name: data
          mountPath: /data
        - name: redis-config
          mountPath: /etc/redis/
      volumes:
      - name: redis-config
        emptyDir: {}
      - name: config
        configMap:
          name: redis-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: scaleio
      resources:
        requests:
          storage: 100Mi

---
apiVersion: v1
kind: Service
metadata:
  name: redis
spec:
  clusterIP: None
  ports:
  - port: 6379
    targetPort: 6379
    name: redis
  selector:
    app: redis

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sentinel
spec:
  serviceName: sentinel
  replicas: 3
  selector:
    matchLabels:
      app: sentinel
  template:
    metadata:
      labels:
        app: sentinel
    spec:
      initContainers:
      - name: config
        image: redis:6.2-rc3
        command: [ "sh", "-c" ]
        args:
          - |
            REDIS_PASSWORD=testpassword
            MASTER_FQDN=redis-0.redis.REDACTED
            POD_FQDN=$(hostname -f)
            echo "port 26379
            protected-mode no
            sentinel resolve-hostnames yes
            sentinel announce-hostnames yes
            sentinel announce-ip $POD_FQDN
            sentinel announce-port 26379
            sentinel monitor mymaster $MASTER_FQDN 6379 2
            sentinel down-after-milliseconds mymaster 5000
            sentinel failover-timeout mymaster 60000
            sentinel parallel-syncs mymaster 1
            sentinel auth-pass mymaster $REDIS_PASSWORD
            " > /etc/redis/sentinel.conf
            cat /etc/redis/sentinel.conf
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/
      containers:
      - name: sentinel
        image: redis:6.2-rc3
        command: ["redis-server", "/etc/redis/sentinel.conf", "--sentinel"]
        ports:
        - containerPort: 26379
          name: sentinel
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/
        - name: data
          mountPath: /data
      volumes:
      - name: redis-config
        emptyDir: {}
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: scaleio
      resources:
        requests:
          storage: 100Mi

---
apiVersion: v1
kind: Service
metadata:
  name: sentinel
spec:
  clusterIP: None
  ports:
  - port: 26379
    targetPort: 26379
    name: sentinel
  selector:
    app: sentinel

Comment From: yossigo

The issue reported by @satheeshaGowda is resolved by #8517 and the original issue of this ticket has been resolved by #8481. Closing this, please feel free to re-open with new information if necessary.