Need help in understanding what is going wrong
I have deployed redis in kubernetes environment, where i have 1 master 2 slaves and 3 sentinel. I am using redis 6.2.3 alpine image. All redis/sentinel running in separate pods.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
redis-0 1/1 Running 0 31m 10.233.64.143 vm1
Also i have written headless service each for redis and sentinel pods using which i can reach out to specific pod behind service.
[root@master-1 ~]# kubectl describe svc sentinel -n ankit
Name: sentinel
Namespace: ankit
Labels:
[root@master-1 ~]# kubectl describe svc redis -n ankit
Name: redis
Namespace: ankit
Labels:
when redis statefulset pod deployed, i had written a logic in init container of redis yaml to make redis-0 pod as master by default. I can see all the pods are up and running perfectly also all the three sentinels are able to connect with master and other sentinels as well hence forth but when i delete redis master pod, all three sentinels logs SDOWN event but its not escalating to ODOWN event, hence failover is not happening and sentinel is not able to choose new master resulting when redis-0 comes up as a slave, cluster is going in bad state as there is not master.
Sentinel-0 logs after redis master deletion: 1:X 15 Oct 2021 02:13:52.155 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:52.322 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:53.194 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:53.338 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:54.203 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:54.399 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:13:54.635 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:14:00.040 - Accepted 10.233.64.143:33288 1:X 15 Oct 2021 02:14:00.047 - Client closed connection
Sentinel-1 logs after deletion of master redis pod 1:X 15 Oct 2021 02:11:10.200 . Rewritten config file (/etc/redis/sentinel.conf) successfully 1:X 15 Oct 2021 02:13:54.600 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379 1:X 15 Oct 2021 02:14:00.054 - Accepted 10.233.64.143:48550 1:X 15 Oct 2021 02:14:00.055 - Client closed connection
Sentinel-2 logs after deletion of master redis pod 1:X 15 Oct 2021 02:11:09.858 . Rewritten config file (/etc/redis/sentinel.conf) successfully 1:X 15 Oct 2021 02:11:10.244 - Accepted 10.233.64.93:35181 1:X 15 Oct 2021 02:11:10.264 - Accepted 10.233.64.35:56403 1:X 15 Oct 2021 02:13:54.636 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379
As we can see its not escalating to ODOWN event hence further failver is also not happening.
Attaching redis and sentinel conf file
Redis conf file: masterauth password requirepass password bind 0.0.0.0 protected-mode no port 6379 tcp-backlog 511
Close the connection after a client is idle for N seconds (0 to disable)
timeout 0 tcp-keepalive 300 daemonize no supervised no pidfile "/var/run/redis_6379.pid" loglevel debug logfile "" databases 16 always-show-logo yes save 900 1 save 300 10 save 60 10000 stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes dbfilename "dump.rdb" rdb-del-sync-files no dir "/data" replica-serve-stale-data yes replica-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-diskless-load disabled repl-disable-tcp-nodelay no replica-priority 100 acllog-max-len 128 maxclients 9000
lazyfree-lazy-eviction no lazyfree-lazy-expire no lazyfree-lazy-server-del no replica-lazy-flush no
lazyfree-lazy-user-del no appendonly yes appendfilename "appendonly.aof"
appendfsync everysec no-appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes aof-use-rdb-preamble yes lua-time-limit 5000 latency-monitor-threshold 0 notify-keyspace-events "" hash-max-ziplist-entries 512 hash-max-ziplist-value 64 list-max-ziplist-size -2
list-compress-depth 0 set-max-intset-entries 512 zset-max-ziplist-entries 128 zset-max-ziplist-value 64 hll-sparse-max-bytes 3000
stream-node-max-bytes 4kb stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0 client-output-buffer-limit replica 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60 hz 10 dynamic-hz yes aof-rewrite-incremental-fsync yes rdb-save-incremental-fsync yes
Jemalloc background thread for purging will be enabled by default
jemalloc-bg-thread yes
slaveof redis-0.redis.ankit.svc.cluster.local 6379
Sentinel conf file: port 5000 daemonize no protected-mode no bind 0.0.0.0 acllog-max-len 128 sentinel deny-scripts-reconfig yes sentinel resolve-hostnames yes sentinel announce-hostnames yes sentinel monitor mymaster redis-0.redis.ankit.svc.cluster.local 6379 2 sentinel down-after-milliseconds mymaster 4000 sentinel failover-timeout mymaster 2000
sentinel auth-pass mymaster password maxclients 9000 loglevel debug
Generated by CONFIG REWRITE
user default on nopass ~ & +@all dir "/data" sentinel myid c7d1f666d94b7ab0a05701c83ccd1246d2628ca1 sentinel config-epoch mymaster 0 sentinel leader-epoch mymaster 0 sentinel current-epoch 0 sentinel known-replica mymaster 10.233.64.90 6379 sentinel known-replica mymaster 10.233.64.40 6379 sentinel known-sentinel mymaster 10.233.64.34 5000 6e5e0ecf8551c21b543815c966a19a54809677c4 sentinel known-sentinel mymaster 10.233.64.35 5000 3f4493c38c5514d76f2eb698aed9c0b6ba550be9