Describe the bug
I'm running sentinel on K8S using spotahome/redis-operator (the problem shouldn't be related to it).
I have 3 redis and 3 sentinel instances using the image redis:6.2.6-alpine.
redis.conf
slaveof 127.0.0.1 6379
port 6379
tcp-keepalive 60
save 900 1
save 300 10
user pinger -@all +ping on >pingpass
masterauth pass
requirepass pass
sentinel.conf
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 5000
sentinel parallel-syncs mymaster 2
K8S PBD:
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS
rfr-redis-routes-sentinels 2 N/A 1
rfs-redis-routes-sentinels 2 N/A 1
During an update of the image to v6.2.12, I suffered the following situation:
- Sentinel A --- restarted (13:22:25)
- Redis-0 --- restarted (13:22:37)
- Sentinel B --- restarted (13:23:08)
- Sentinel C --- restarted (13:23:47)
- Redis-1 --- restarted (13:23:41)
- Redis-0 --- connection lost with master (redis-2) --- become master (13:24:34)
- From Redis-0 logs:
Connection with master lost---Caching the disconnected master state---Discarding previously cached master state - From Redis-2 logs:
Connection with replica 172.16.89.182:6379 lost - From sentinel logs:
Executing user requested FAILOVER of 'mymaster'---+new-epoch 7--- ... ---+elected-leader master mymaster 172.16.55.123 6379(Redis-2)- Note that here the failover process doesn't finish with
failover-end
- Note that here the failover process doesn't finish with
- From Redis-0 logs:
- Redis-1 --- become slave of Redis-0 (13:25:04)
- Redis-1 --- connection lost with master (Redis-0) --- become master (13:25:10)
- Redis-2 --- restarted (13:25:15)
- Redis-2 can't start correctly
Here are the full logs: sentinels-logs.txt, redis-0-logs.txt, redis-1-logs.txt, redis-2-logs.txt
Scaling down replicas to 0 and scaling up again solved the issue.
I would like to know how it could have happened that I ended up with two masters and one of the redises without being able to get up. How could it be fixed or avoid this situation in the future?
Steps to reproduce the behavior and/or a minimal code sample.
N/A
A description of what you expected to happen.
I would expect to still have just one master.
Comment From: moticless
Hi @txabman42 , In order to proceed, it's important that you provide a clear steps of your scenario as it's not entirely clear from your description, particularly regarding what you're simulating and what unexpected behavior you're experiencing. I suggest for a start seeking assistance from the spotahome/redis-operator to determine whether others are also experiencing this issue, and if it's specific to certain versions or exist from day one. If, with the help of other contributors, you're able to confidently identify this as a core Redis issue and have clear steps to reproduce it, please inform us.
BTW, I peeped to the logs and saw some write error to configuration file. Check it. Didn't look further.
Thanks, Moti
Comment From: txabman42
Thank you so much, I will move this issue to spotahome/redis-operator.