Describe the bug
Redis and sentinel use hostname to communicate, failover fails.
To reproduce
Background: redis 6.2.7, k8s deployment, 1 master(pod-0) + 2 slave(pod-1/pod-2) + 3 sentinel
Sentinel/redis both use hostname to communicate, redis is configured with replica-announce-ip, sentinel is configured with resolve-hostnames=yes, announce-hostnames=yes and announce-ip parameters.
- Problem scenario: kubectl cordon redis master node + kubectl delete master pod, make the master pod pending (the master hostname is invalid at this time), but sentinel does not elect a new master
- sentinel log: +fix-slave-config and +failover-abort-no-good-slave, failover failed
- slave log: mater Host is unreachable and Unable to connect to MASTER: Invalid argument
Expected behavior
Sentinel can select a master from the slave, and the failover is successful.
Additional information
I read the sentinel code with the log, preliminary analysis:
-
When sentinel regularly sends info to redis, it will check whether the ip parsed by the master host of the slave has changed (if the hostname fails, it is considered to have changed)
-
After the change is considered, the sentinel will force a slaveof $master + kill connection to the slave (the connection between the sentinel and the slave will also be forcibly disconnected, and the sentinel will print the log +fix-slave-config)
-
At this time, the majority sentinel determines the master odwon and initiates failover, but when selecting slave, it will check the connection status of sentinel and slave (slave->link->disconnected)
-
Because of the second step of sentinel, the connection between sentinel and slave has been disconnected (slave->link->disconnected=true), sentinel selects slave to promote master, it will filter disconnected slave
-
Eventually, no slave can be promoted to master, and failover fails
-
Not every time, there will be problems when the 2nd and 3rd hits happen
Not sure if it is a bug of redis, please help to troubleshoot or provide helpful information to update.
The relevant log and source code analysis are as follows:
Comment From: enjoy-binbin
thanks for the report, it looks detailed.
@moticless i suppose you can take a look when you have time, maybe related to #10146
Comment From: moticless
Hi @enjoy-binbin, I planned to reach it later this day. Thanks.
Comment From: moticless
Hi @kaito-kidd, Before we dive into details, please help me to understand the following points: * What do you mean "Not every time, there will be problems when the 2nd and 3rd hits happen". * In continuation to previous bullet, how often is it happen? Is there any pattern? * Please verify that redis-sentinel application has write permission to the configuration files, as they rewrite configuration file for persistency. * Please compare your configuration to the one available here, which shows a simple hostname-based configuration that can run with docker-compose. It brings up 3 sentinel and 2 servers. (You can also play with it, as comparison.) * Please supply sentinel and server configuration
Thank you
Comment From: kaito-kidd
Hi @moticless,
- the problem does not always occur every time, no fixed rules. After I read the code analysis, I think that this problem will only occur when step2 and step3 meet.
- redis-sentinel confirms permission to write configuration file
- my configuration looks the same as the example you provided
- redis server and sentinel configs:
# master
replica-announce-ip "redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
# slave1
replica-announce-ip "redis-sso-redis-demo-ss-2.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
replicaof redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379
# slave2
replica-announce-ip "redis-sso-redis-demo-ss-2.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
replicaof redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379
# sentinel1
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel announce-ip "global-redis-sentinel-0.global-redis-sentinel-headless.sso.svc.global.chongqing"
sentinel monitor sso-redis-demo redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379 2
sentinel failover-timeout sso-redis-demo 120000
sentinel down-after-milliseconds sso-redis-demo 15000
# sentinel2
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel announce-ip "global-redis-sentinel-1.global-redis-sentinel-headless.sso.svc.global.chongqing"
sentinel monitor sso-redis-demo redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379 2
sentinel down-after-milliseconds sso-redis-demo 15000
sentinel failover-timeout sso-redis-demo 120000
# sentinel3
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel announce-ip "global-redis-sentinel-2.global-redis-sentinel-headless.sso.svc.global.chongqing"
sentinel monitor sso-redis-demo redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379 2
sentinel down-after-milliseconds sso-redis-demo 15000
sentinel failover-timeout sso-redis-demo 120000
Thank you
Comment From: moticless
slave1 & slave2 announces exactly the same name.
Comment From: kaito-kidd
@moticless, Sorry, there is a problem with copying and pasting, the correct configuration of redis is as follows:
# master
replica-announce-ip "redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
# slave1
replica-announce-ip "redis-sso-redis-demo-ss-1.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
replicaof redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379
# slave2
replica-announce-ip "redis-sso-redis-demo-ss-2.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
replicaof redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379
Comment From: kaito-kidd
When this problem occurs, the sentinel log is accompanied by this keyword log: +fix-slave-config,+fix-slave-config does not appear, this problem does not appear.
You can try to invalidate the master hostname to reproduce this problem, note that it takes many attempts to reproduce once.
Comment From: moticless
Regarding the question of how often it reproduced, you said the problem does not always occur every time. And now are saying it takes many attempts to reproduce once. And then there is the wrong configuration. I am sorry but it is hard to me to follow. Please take the time to double check your findings.
Comment From: kaito-kidd
@moticless, Sorry, maybe my statement is not accurate.
Allow me to express my question again.
The configuration of redis and sentinel is as follows:
# master
replica-announce-ip "redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
# slave1
replica-announce-ip "redis-sso-redis-demo-ss-1.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
replicaof redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379
# slave2
replica-announce-ip "redis-sso-redis-demo-ss-2.redis-sso-redis-demo-headless.sso.svc.global.chongqing"
replicaof redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379
# sentinel1
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel announce-ip "global-redis-sentinel-0.global-redis-sentinel-headless.sso.svc.global.chongqing"
sentinel monitor sso-redis-demo redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379 2
sentinel failover-timeout sso-redis-demo 120000
sentinel down-after-milliseconds sso-redis-demo 15000
# sentinel2
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel announce-ip "global-redis-sentinel-1.global-redis-sentinel-headless.sso.svc.global.chongqing"
sentinel monitor sso-redis-demo redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379 2
sentinel down-after-milliseconds sso-redis-demo 15000
sentinel failover-timeout sso-redis-demo 120000
# sentinel3
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel announce-ip "global-redis-sentinel-2.global-redis-sentinel-headless.sso.svc.global.chongqing"
sentinel monitor sso-redis-demo redis-sso-redis-demo-ss-0.redis-sso-redis-demo-headless.sso.svc.global.chongqing 6379 2
sentinel down-after-milliseconds sso-redis-demo 15000
sentinel failover-timeout sso-redis-demo 120000
The problem does happen occasionally, I want to get to the root cause of the problem and I'm trying to reproduce it.
The way to reproduce is to make the master down + hostname invalid, and then observe whether the sentinel failover is successfully switched.
I tried multiple times, problem can be reproduced. When the problem occurs, the sentinel log has key logs: +fix-slave-config, -failover-abort-no-good-slave.
If needed, I can write another simple example to help you reproduce it.
Comment From: moticless
Hi @kaito-kidd , I don't understand what it means "master down + hostname invalid". How exactly you make the master unavailable? shutdown/pause the pod? just change its hostname? Does sentinels run on distinct pods?
Comment From: kaito-kidd
Hi @moticless, my steps:
step1. kubectl cordon master node step2. kubectl delete master pod step3. now, master pod pending, and master hostname invalid step4. observed that sentinel did not elect master
sentinels run in different pods, distributed on different nodes.
Comment From: moticless
The sentinel flow, that you are describing is rather common one. I also let it run, only this specific flow, on docker-compose for two days without any issues.
Maybe you can try docker-compose that I gave you above, and start change it gradually to be exact copy of your configuration and see when/if it breaks. If it will break, then we shall have common ground to start investigate it. If it works perfectly on docker-compose, go the opposite direction on k8s.
Comment From: kaito-kidd
Hi @moticless,
I used your docker-compose configuration and reproduced the issue as well.
My operation steps:
Step 1. clone your repo, build env
git clone https://github.com/moticless/redis-network-testing.git ~/github
cd ~/github/redis-network-testing
git submodule init
git submodule update
./build_redis.py
check redis build ok
➜ redis-network-testing git:(main) ✗ ll artifacts
total 92760
-r----x--t 1 ryetan staff 7.8M 10 9 17:53 redis-benchmark
-r----x--t 1 ryetan staff 7.7M 10 9 17:53 redis-cli
-r----x--t 1 ryetan staff 15M 10 9 17:53 redis-sentinel
-r----x--t 1 ryetan staff 15M 10 9 17:53 redis-server
Step 2. Modify env, docker-compose starts redis and sentinels
➜ hostname-based git:(main) ✗ cat ~/github/redis-network-testing/docker-compose-setups/hostname-based/.env
BIN_DST_PATH=/usr/local/bin/
# The test will use other values. Overriden with env-var.:
IMAGE="ubuntu:20.04"
# my env var
BIN_SRC_PATH=~/github/redis-network-testing/artifacts
Start the service:
➜ hostname-based git:(main) ✗ docker-compose up -d
Creating network "hostname-based_main" with the default driver
Creating hostname-based_replica2_1 ... done
Creating hostname-based_instance_standby_1 ... done
Creating hostname-based_sentinel1_1 ... done
Creating hostname-based_sentinel2_1 ... done
Creating hostname-based_replica1_1 ... done
Creating hostname-based_sentinel3_1 ... done
Started successfully:
➜ hostname-based git:(main) ✗ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
076052c78ed9 ubuntu:20.04 "bash -c 'redis-serv…" 34 seconds ago Up 32 seconds 0.0.0.0:8764->8764/tcp hostname-based_sentinel3_1
2ebac9ab0b3b ubuntu:20.04 "bash -c 'redis-serv…" 34 seconds ago Up 32 seconds hostname-based_sentinel2_1
7d69aa962ac5 ubuntu:20.04 "redis-server /test/…" 34 seconds ago Up 32 seconds hostname-based_replica1_1
455e14848074 ubuntu:20.04 "bash -c 'redis-serv…" 34 seconds ago Up 33 seconds hostname-based_sentinel1_1
a3ecfc6e54ad ubuntu:20.04 "bash -c 'while true…" 34 seconds ago Up 32 seconds hostname-based_instance_standby_1
035eea666c51 ubuntu:20.04 "redis-server /test/…" 34 seconds ago Up 33 seconds hostname-based_replica2_1
The status of master and slave is ok:
➜ hostname-based git:(main) ✗ docker exec -it hostname-based_instance_standby_1 redis-cli -h redis-master info replication
# Replication
role:master
connected_slaves:1
slave0:ip=redis-slave,port=6379,state=online,offset=2190,lag=1
master_failover_state:no-failover
master_replid:f945eb62e11227d0f88ec9848e84e5f7d56caec0
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:2474
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:2474
Step 3. Execute the docker stop $master command
docker stop hostname-based_replica1_1
Step 4. Observe sentinel log and find that failover failed
# sentinel-1
1:X 10 Oct 2022 12:05:02.945 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 10 Oct 2022 12:05:02.945 # Redis version=255.255.255, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 10 Oct 2022 12:05:02.945 # Configuration loaded
1:X 10 Oct 2022 12:05:02.946 * monotonic clock: POSIX clock_gettime
1:X 10 Oct 2022 12:05:02.946 * Running mode=sentinel, port=26379.
1:X 10 Oct 2022 12:05:02.976 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:02.976 # Sentinel ID is c7c59ae8fb334f663289d46b8866e33f1f60fd0e
1:X 10 Oct 2022 12:05:02.976 # +monitor master mymaster redis-master 6379 quorum 2
1:X 10 Oct 2022 12:05:02.978 * +slave slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:02.996 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:04.910 * +sentinel sentinel 8c19832232d462b9330a41d8a3103d2049ed7912 redis-sentinel2 26379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:04.931 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:04.996 * +sentinel sentinel 40b9669e83e1b991de1b3cc7461b918331ee0376 redis-sentinel3 26379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:05.024 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:25.360 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:28.314 * +fix-slave-config slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.320 * +fix-slave-config slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.365 # +sdown master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.411 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.460 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.477 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.492 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:29.492 # +new-epoch 1
1:X 10 Oct 2022 12:05:29.506 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:29.506 # +vote-for-leader 8c19832232d462b9330a41d8a3103d2049ed7912 1
1:X 10 Oct 2022 12:05:30.494 # +odown master mymaster redis-master 6379 #quorum 3/2
1:X 10 Oct 2022 12:05:30.494 # Next failover delay: I will not start a failover before Mon Oct 10 12:05:39 2022
1:X 10 Oct 2022 12:05:30.571 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:30.583 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:31.602 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:31.619 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:32.645 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:32.665 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:33.700 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:33.709 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:34.749 # Failed to resolve hostname 'redis-master'
# sentinel-2
1:X 10 Oct 2022 12:05:02.879 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 10 Oct 2022 12:05:02.879 # Redis version=255.255.255, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 10 Oct 2022 12:05:02.879 # Configuration loaded
1:X 10 Oct 2022 12:05:02.880 * monotonic clock: POSIX clock_gettime
1:X 10 Oct 2022 12:05:02.890 * Running mode=sentinel, port=26379.
1:X 10 Oct 2022 12:05:02.905 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:02.905 # Sentinel ID is 8c19832232d462b9330a41d8a3103d2049ed7912
1:X 10 Oct 2022 12:05:02.905 # +monitor master mymaster redis-master 6379 quorum 2
1:X 10 Oct 2022 12:05:02.908 * +slave slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:02.920 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:04.953 * +sentinel sentinel c7c59ae8fb334f663289d46b8866e33f1f60fd0e redis-sentinel1 26379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:04.972 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:04.996 * +sentinel sentinel 40b9669e83e1b991de1b3cc7461b918331ee0376 redis-sentinel3 26379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:05.016 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:25.363 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:28.315 * +fix-slave-config slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.314 * +fix-slave-config slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.382 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.382 # +sdown master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.435 # +odown master mymaster redis-master 6379 #quorum 2/2
1:X 10 Oct 2022 12:05:29.435 # +new-epoch 1
1:X 10 Oct 2022 12:05:29.435 # +try-failover master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.465 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:29.465 # +vote-for-leader 8c19832232d462b9330a41d8a3103d2049ed7912 1
1:X 10 Oct 2022 12:05:29.477 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.508 # c7c59ae8fb334f663289d46b8866e33f1f60fd0e voted for 8c19832232d462b9330a41d8a3103d2049ed7912 1
1:X 10 Oct 2022 12:05:29.510 # 40b9669e83e1b991de1b3cc7461b918331ee0376 voted for 8c19832232d462b9330a41d8a3103d2049ed7912 1
1:X 10 Oct 2022 12:05:29.516 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.537 # +elected-leader master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.537 # +failover-state-select-slave master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.622 # -failover-abort-no-good-slave master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.685 # Next failover delay: I will not start a failover before Mon Oct 10 12:05:40 2022
1:X 10 Oct 2022 12:05:30.453 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:30.572 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:31.526 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:31.619 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:32.598 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:32.665 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:33.621 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:33.700 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:34.661 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:34.754 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:35.748 # Failed to resolve hostname 'redis-master'
# sentinel-3
1:X 10 Oct 2022 12:05:02.991 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 10 Oct 2022 12:05:02.991 # Redis version=255.255.255, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 10 Oct 2022 12:05:02.991 # Configuration loaded
1:X 10 Oct 2022 12:05:02.994 * monotonic clock: POSIX clock_gettime
1:X 10 Oct 2022 12:05:02.995 * Running mode=sentinel, port=26379.
1:X 10 Oct 2022 12:05:03.008 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:03.008 # Sentinel ID is 40b9669e83e1b991de1b3cc7461b918331ee0376
1:X 10 Oct 2022 12:05:03.008 # +monitor master mymaster redis-master 6379 quorum 2
1:X 10 Oct 2022 12:05:03.011 * +slave slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:03.023 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:04.910 * +sentinel sentinel 8c19832232d462b9330a41d8a3103d2049ed7912 redis-sentinel2 26379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:04.928 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:04.953 * +sentinel sentinel c7c59ae8fb334f663289d46b8866e33f1f60fd0e redis-sentinel1 26379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:04.975 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:28.320 * +fix-slave-config slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.361 * +fix-slave-config slave redis-slave:6379 redis-slave 6379 @ mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.382 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.411 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.453 # +sdown master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.477 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:29.494 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:29.494 # +new-epoch 1
1:X 10 Oct 2022 12:05:29.509 * Sentinel new configuration saved on disk
1:X 10 Oct 2022 12:05:29.509 # +vote-for-leader 8c19832232d462b9330a41d8a3103d2049ed7912 1
1:X 10 Oct 2022 12:05:29.509 # +odown master mymaster redis-master 6379 #quorum 2/2
1:X 10 Oct 2022 12:05:29.509 # Next failover delay: I will not start a failover before Mon Oct 10 12:05:40 2022
1:X 10 Oct 2022 12:05:30.453 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:30.583 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:31.526 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:31.602 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:32.596 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:32.645 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:33.621 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:33.709 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:34.661 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:34.790 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:35.748 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:35.907 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:36.798 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:36.938 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:37.872 # Failed to resolve hostname 'redis-master'
1:X 10 Oct 2022 12:05:37.981 # Failed to resolve hostname 'redis-master'
I see sentinel-2 is the leader, but failover failed, important log
1:X 10 Oct 2022 12:05:29.537 # +failover-state-select-slave master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.622 # -failover-abort-no-good-slave master mymaster redis-master 6379
1:X 10 Oct 2022 12:05:29.685 # Next failover delay: I will not start a failover before Mon Oct 10 12:05:40 2022
I expected that the slave container could be promoted to master, but it didn't.
I would like to ask, docker stop master container, why slave container is not selected as master?
Please also help to troubleshoot the problem, thank you very much.
If you cannot reproduce the problem, please try several times, as it does not occur every time.
Comment From: moticless
Hi @kaito-kidd, I managed to reproduce the problem.
It looks like that during failover scenario, when the master is down but sentinel managed to resolve master hostname, then the flow pass as expected. But if it failed to resolve hostname, then failover gets failed.
I will see what it takes to fix it.
Thank you.
Comment From: moticless
Hi @kaito-kidd , Can you test please my PR #11419 and see if it resolves your issue?
Thank you.
Comment From: kaito-kidd
Hi @moticless,
I have tested this MR, the test passed, sentinel failover passed, as expected, thank you very much.
Will this MR backforward to 6.x? I am using version 6.2.7 now. I'm hoping for is a minor version upgrade.
Comment From: moticless
Hi @moticless,
I have tested this MR, the test passed, sentinel failover passed, as expected, thank you very much.
Will this MR backforward to 6.x? I am using version 6.2.7 now. I'm hoping for is a minor version upgrade.
@yossigo FYI
Comment From: moticless
https://github.com/redis/redis/pull/11419 goanna backport 6.2 & 7.0.