Description: We have a Redis cluster consisting of 3 master nodes and 9 slave nodes, distributed across two data centers: • DC1: 6 nodes • DC2: 6 nodes

When transitioning operations to DC2, we perform a manual failover process to make DC2 the primary data center. This is achieved using the CLUSTER FAILOVER FORCE command, which is executed sequentially on all nodes in DC2 that we want to promote as masters, with a 2-second interval between commands.

While the first failover worked smoothly, the second attempt encountered the following issues: • One of the master nodes in DC1 did not transition to DC2. • Slave nodes were redistributed unevenly: • The first master node ended up with 4 slaves. • The second master node retained only 2 slaves. • All slaves for the third master node were located in DC2.

This unexpected behavior caused an imbalance in the cluster and disrupted our intended configuration.

Steps to Reproduce: 1. Set up a Redis cluster with 3 master nodes and 9 slave nodes across two data centers (6 nodes per DC). 2. Perform a manual failover using the CLUSTER FAILOVER FORCE command on DC2 nodes sequentially, with a 2-second interval. 3. Observe the cluster’s state and node distribution after the failover.

Expected Behavior: All master nodes should transition to DC2, and the slave nodes should be evenly distributed among the masters.

Actual Behavior: • One master node remained in DC1. • Slave nodes were unevenly distributed among the masters.

Environment Details: • Redis version: v=6.0.16 • as-is Cluster setup details: DC1 3 master, 3 slave DC2 6 slaves, every master has 3 slaves.

Additional Context: The imbalance in the slave distribution and the failure of a master node to transition is affecting cluster reliability and performance.

How can we best manage this process? What is the best method to handle a manual cluster failover process?

Comment From: ShooterIT

Hi @ynsdll do you enable cluster-allow-replica-migration?

Comment From: ynsdll

Hi @ShooterIT No i didnt enable cluster-allow-replica-migration, but i set high value for cluster-migration-barrier