Need help in understanding what is going wrong

I have deployed redis in kubernetes environment, where i have 1 master 2 slaves and 3 sentinel. I am using redis 6.2.3 alpine image. All redis/sentinel running in separate pods.

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES redis-0 1/1 Running 0 31m 10.233.64.143 vm1 redis-1 1/1 Running 0 34m 10.233.64.90 vm1 redis-2 1/1 Running 0 34m 10.233.64.40 vm1 sentinel-0 1/1 Running 0 34m 10.233.64.93 vm1 sentinel-1 1/1 Running 0 34m 10.233.64.35 vm1 sentinel-2 1/1 Running 0 34m 10.233.64.34 vm1

Also i have written headless service each for redis and sentinel pods using which i can reach out to specific pod behind service. [root@master-1 ~]# kubectl describe svc sentinel -n ankit Name: sentinel Namespace: ankit Labels: Annotations: Selector: app=sentinel Type: ClusterIP IP: None Port: sentinel 5000/TCP TargetPort: 5000/TCP Endpoints: 10.233.64.34:5000,10.233.64.35:5000,10.233.64.93:5000 Session Affinity: None

[root@master-1 ~]# kubectl describe svc redis -n ankit Name: redis Namespace: ankit Labels: Annotations: Selector: app=redis Type: ClusterIP IP: None Port: redis 6379/TCP TargetPort: 6379/TCP Endpoints: 10.233.64.143:6379,10.233.64.40:6379,10.233.64.90:6379 Session Affinity: None Events: [root@master-1 ~]#

when redis statefulset pod deployed, i had written a logic in init container of redis yaml to make redis-0 pod as master by default. I can see all the pods are up and running perfectly also all the three sentinels are able to connect with master and other sentinels as well hence forth but when i delete redis master pod, all three sentinels logs SDOWN event but its not escalating to ODOWN event, hence failover is not happening and sentinel is not able to choose new master resulting when redis-0 comes up as a slave, cluster is going in bad state as there is not master.

Sentinel-0 logs after redis master deletion: 1:X 15 Oct 2021 02:13:52.155 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:52.322 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:53.194 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:53.338 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:54.203 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:54.399 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:54.635 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:14:00.040 - Accepted 10.233.64.143:33288 1:X 15 Oct 2021 02:14:00.047 - Client closed connection

Sentinel-1 logs after deletion of master redis pod 1:X 15 Oct 2021 02:11:10.200 . Rewritten config file (/etc/redis/sentinel.conf) successfully 1:X 15 Oct 2021 02:13:54.600 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:14:00.054 - Accepted 10.233.64.143:48550 1:X 15 Oct 2021 02:14:00.055 - Client closed connection

Sentinel-2 logs after deletion of master redis pod 1:X 15 Oct 2021 02:11:09.858 . Rewritten config file (/etc/redis/sentinel.conf) successfully 1:X 15 Oct 2021 02:11:10.244 - Accepted 10.233.64.93:35181 1:X 15 Oct 2021 02:11:10.264 - Accepted 10.233.64.35:56403 1:X 15 Oct 2021 02:13:54.636 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379

As we can see its not escalating to ODOWN event hence further failver is also not happening.

Attaching redis and sentinel conf file

Redis conf file: masterauth password requirepass password bind 0.0.0.0 protected-mode no port 6379 tcp-backlog 511

Close the connection after a client is idle for N seconds (0 to disable)

timeout 0 tcp-keepalive 300 daemonize no supervised no pidfile "/var/run/redis_6379.pid" loglevel debug logfile "" databases 16 always-show-logo yes save 900 1 save 300 10 save 60 10000 stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes dbfilename "dump.rdb" rdb-del-sync-files no dir "/data" replica-serve-stale-data yes replica-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-diskless-load disabled repl-disable-tcp-nodelay no replica-priority 100 acllog-max-len 128 maxclients 9000

lazyfree-lazy-eviction no lazyfree-lazy-expire no lazyfree-lazy-server-del no replica-lazy-flush no

lazyfree-lazy-user-del no appendonly yes appendfilename "appendonly.aof"

appendfsync everysec no-appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes aof-use-rdb-preamble yes lua-time-limit 5000 latency-monitor-threshold 0 notify-keyspace-events "" hash-max-ziplist-entries 512 hash-max-ziplist-value 64 list-max-ziplist-size -2

list-compress-depth 0 set-max-intset-entries 512 zset-max-ziplist-entries 128 zset-max-ziplist-value 64 hll-sparse-max-bytes 3000

stream-node-max-bytes 4kb stream-node-max-entries 100

activerehashing yes

client-output-buffer-limit normal 0 0 0 client-output-buffer-limit replica 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60 hz 10 dynamic-hz yes aof-rewrite-incremental-fsync yes rdb-save-incremental-fsync yes

Jemalloc background thread for purging will be enabled by default

jemalloc-bg-thread yes

slaveof redis-0.redis.ankit.svc.cluster.local 6379

Sentinel conf file: port 5000 daemonize no protected-mode no bind 0.0.0.0 acllog-max-len 128 sentinel deny-scripts-reconfig yes sentinel resolve-hostnames yes sentinel announce-hostnames yes sentinel monitor mymaster redis-0.redis.ankit.svc.cluster.local 6379 2 sentinel down-after-milliseconds mymaster 4000 sentinel failover-timeout mymaster 2000

sentinel auth-pass mymaster password maxclients 9000 loglevel debug

Generated by CONFIG REWRITE

user default on nopass ~ & +@all dir "/data" sentinel myid c7d1f666d94b7ab0a05701c83ccd1246d2628ca1 sentinel config-epoch mymaster 0 sentinel leader-epoch mymaster 0 sentinel current-epoch 0 sentinel known-replica mymaster 10.233.64.90 6379 sentinel known-replica mymaster 10.233.64.40 6379 sentinel known-sentinel mymaster 10.233.64.34 5000 6e5e0ecf8551c21b543815c966a19a54809677c4 sentinel known-sentinel mymaster 10.233.64.35 5000 3f4493c38c5514d76f2eb698aed9c0b6ba550be9

Redis redis sentinel not escalating from SDOWN to ODOWN event

Close the connection after a client is idle for N seconds (0 to disable)

Jemalloc background thread for purging will be enabled by default

Generated by CONFIG REWRITE