My Failover Processes looks like this.
PHP connects to redis via HAProxy.
1) Sentinel detects a failure.
2) Sentinel decides to switch the replica role to master.
3) Sentinel performs a replica role switch to a new master role.
4) Sentinel runs a client-reconfig-script which will record the IP address and port of the new master in the cosnsul kv store.
(https://github.com/pospelov-v/redis-sentinel-submit-master-to-cosnsul-kv-store)
5) The Consul template is located on a separate host, it monitors the key change in the cosnsul kv store, when it changes, it fills in the HAProxy settings template and executes the service haproxy reload command.
Thus, HAProxy always has only one address in the settings - this is the address of the current redis master.
6) Sentinel will also switch the role of the old master to replica to the new master. He will do this the next time he can contact him. Until this happens, it checks the connection with the old master over and over again, with a certain timeout.
The problem is a network failure with a short duration. For example, 3 to 8 seconds.
At the moment when the old master appears on the network, and point 6 has not yet been completed, a short split brain appears. The old master receives requests that were still in the HAProxy queue. The old master then receives the slaveof ... command from the Sentinel and deletes these entries.
As a result of this situation, the PHP client added some keys to redis and received an OK response. And a little later, the PHP client cannot get back some of these keys, since they got into the old master and then were deleted when reconfiguring it to the slave.
If Sentinel had something like pre-failover-script, which would work exactly the same as client-reconfig-script, but before Failover Processes, then it would be possible to assign a mechanism to it that isolated would take PHP clients from the old wizard even before starting the switch process. Something like, STONITH (https://en.wikipedia.org/wiki/STONITH)
Do you plan to add something like a pre-failover-script to the Sentinel that will work exactly like the client-reconfig-script, but before the Failover Processes?
Comment From: hwware
Hello @pospelov-v , based on your description i think this is write loss problem when using sentinel to do failover. https://github.com/redis/redis/issues/6062 . Currently the sentinel failover is not orchestrated, therefore old master can still receive request and causing the write loss after it switch into replica. For the pre-failover-script use case, i think it make sense to have this, but i am still not sure whether this is enough to make sure we elimiate the problem. How do you think @yossigo ?
Comment From: yossigo
@hwware I agree with you, there are ways to minimize write loss but not prevent it completely.
As for a pre-failover-script, I think it might be useful in other situations so I'm not against it, unless it creates other problems or introduces unexpected complexity.
Comment From: AlexeyBoiler
@yossigo Hello. Are you planning to implement this feature? We also made observations and realized that this could solve the problem of the split-brain.
@hwware I agree with you, there are ways to minimize write loss but not prevent it completely. As for a
pre-failover-script, I think it might be useful in other situations so I'm not against it, unless it creates other problems or introduces unexpected complexity.
Comment From: yossigo
Hi @AlexeyBoiler, I think using a pre-failover script to address the original problem is a limited workaround, but that doesn't mean we shouldn't have this capability which could also be useful in other cases.
Comment From: AlexeyBoiler
Hi @AlexeyBoiler, I think using a pre-failover script to address the original problem is a limited workaround, but that doesn't mean we shouldn't have this capability which could also be useful in other cases.
Thanks for the quick response. Such a function will allow you to build a much better mechanism for the splitbrain problem. This feature in the Orchestrator for MySQL project helped us solve many cluster management tasks.
Comment From: AlexeyBoiler
@yossigo Thank you and the team for working on such a great project. I was looking through the latest releases of redis and could not see any solution in this direction. Are you planning to do something similar in the near future?
Comment From: yossigo
@AlexeyBoiler As I mentioned above, I don't see a reason to reject this but given the priority and everything else on our plate I think it can only happen if driven by someone in the community.