Discussed in https://github.com/redis/redis/discussions/9616
Originally posted by **PingXie** October 7, 2021
Hi,
This is a question about the graceful failover behavior in a Redis 6.x cluster.
I noticed that, after a successful failover, the old primary doesn't immediately unpause clients until client_pause_end_time is reached, which is set to 10s (CLUSTER_MF_TIMEOUT * CLUSTER_MF_PAUSE_MULTI) into the future in pauseClients(). Therefore, an unfortunate write request/client can get stuck on the old primary for quite some time, if it arrives somewhere between pauseClients() and unpauseClients(). In the worst case this write/writer can be blocked for 10s. Assuming this is correct understanding, I wonder if anyone has any insight on this behavior and what could be the potential pitfalls if the old primary were to release the paused clients right when it knows that one of its replica has successfully won the election.
Thanks,