We are running a 6-node Redis cluster (version 7.2.0) in a Docker environment with 1 replica. When we stop one of the master nodes in the cluster, the CLUSTER SHARDS command returns empty slots for that specific shard. The node from the failed shard returns the correct output for the CLUSTER SHARRD command.

Output with an empty array from the container which is not a part of the failed shard.

1) 1) "slots"
   2) (empty array)
   3) "nodes"
   4) 1)  1) "id"
          2) "b1c9fe739d6e8d0c519f98ac5e8ebcd1e52cfbe3"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.3"
          7) "endpoint"
          8) "172.18.0.3"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 224
         13) "health"
         14) "fail"
      2)  1) "id"
          2) "42baa4b4da6a0cc4ef9e18c43c2f86403822b72b"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.5"
          7) "endpoint"
          8) "172.18.0.5"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 224
         13) "health"
         14) "online"
2) 1) "slots"
   2) 1) (integer) 5461
      2) (integer) 10922
   3) "nodes"
   4) 1)  1) "id"
          2) "7baa9f314205d4047655830f45e4014187918e0c"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.6"
          7) "endpoint"
          8) "172.18.0.6"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 322
         13) "health"
         14) "online"
      2)  1) "id"
          2) "0c51817bbfb0879bf4aaad66a1244b76d1a64d2b"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.7"
          7) "endpoint"
          8) "172.18.0.7"
          9) "role"
         10) "replica"
         11) "replication-offset"
         12) (integer) 322
         13) "health"
         14) "online"
3) 1) "slots"
   2) 1) (integer) 0
      2) (integer) 5460
   3) "nodes"
   4) 1)  1) "id"
          2) "5fa7cec07512060397bcfda7bbb1cec73052a905"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.2"
          7) "endpoint"
          8) "172.18.0.2"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 308
         13) "health"
         14) "online"
      2)  1) "id"
          2) "98d6ec2bd84ae48527f6c67148464d5e8d55afb1"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.4"
          7) "endpoint"
          8) "172.18.0.4"
          9) "role"
         10) "replica"
         11) "replication-offset"
         12) (integer) 308
         13) "health"
         14) "online"

Correct output from one of the replicas (part of the same failed shard):

1) 1) "slots"
   2) 1) (integer) 5461
      2) (integer) 10922
   3) "nodes"
   4) 1)  1) "id"
          2) "7baa9f314205d4047655830f45e4014187918e0c"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.6"
          7) "endpoint"
          8) "172.18.0.6"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 12222
         13) "health"
         14) "online"
      2)  1) "id"
          2) "0c51817bbfb0879bf4aaad66a1244b76d1a64d2b"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.7"
          7) "endpoint"
          8) "172.18.0.7"
          9) "role"
         10) "replica"
         11) "replication-offset"
         12) (integer) 12222
         13) "health"
         14) "online"
2) 1) "slots"
   2) 1) (integer) 0
      2) (integer) 5460
   3) "nodes"
   4) 1)  1) "id"
          2) "98d6ec2bd84ae48527f6c67148464d5e8d55afb1"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.4"
          7) "endpoint"
          8) "172.18.0.4"
          9) "role"
         10) "replica"
         11) "replication-offset"
         12) (integer) 12208
         13) "health"
         14) "online"
      2)  1) "id"
          2) "5fa7cec07512060397bcfda7bbb1cec73052a905"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.2"
          7) "endpoint"
          8) "172.18.0.2"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 12208
         13) "health"
         14) "online"
3) 1) "slots"
   2) 1) (integer) 10923
      2) (integer) 16383
   3) "nodes"
   4) 1)  1) "id"
          2) "42baa4b4da6a0cc4ef9e18c43c2f86403822b72b"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.5"
          7) "endpoint"
          8) "172.18.0.5"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 224
         13) "health"
         14) "online"
      2)  1) "id"
          2) "b1c9fe739d6e8d0c519f98ac5e8ebcd1e52cfbe3"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "172.18.0.3"
          7) "endpoint"
          8) "172.18.0.3"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 210
         13) "health"
         14) "fail"

Steps to reproduce the behavior and/or a minimal code sample.

  • start cluster using attached docker-compose.yml.txt
  • create a cluster using below command docker exec -it redis-node-0 redis-cli --cluster create redis-node-0:6379 redis-node-1:6379 redis-node-2:6379 redis-node-3:6379 redis-node-4:6379 redis-node-5:6379 --cluster-replicas 1
  • stop one of the container and check output of CLUSTER SHARDS command from different container. docker stop redis-node-2 docker exec -it redis-node-0 redis-cli -h redis-node-5 CLUSTER SHARDS docker exec -it redis-node-0 redis-cli -h redis-node-0 CLUSTER SHARDS

It should return slots correctly from all of the nodes in the cluster. Please let us know if we are doing something wrong or if this is expected behavior.

Comment From: sundb

@patademahesh was the replica of the failed node promoted to master?

Comment From: patademahesh

Yes, here is the NODES command output

$ docker exec -it redis-node-0 redis-cli -h redis-node-5 CLUSTER NODES
7baa9f314205d4047655830f45e4014187918e0c 172.18.0.6:6379@16379 master - 0 1720880745097 2 connected 5461-10922
0c51817bbfb0879bf4aaad66a1244b76d1a64d2b 172.18.0.7:6379@16379 myself,slave 7baa9f314205d4047655830f45e4014187918e0c 0 1720880742000 2 connected
98d6ec2bd84ae48527f6c67148464d5e8d55afb1 172.18.0.4:6379@16379 slave 5fa7cec07512060397bcfda7bbb1cec73052a905 0 1720880743086 1 connected
42baa4b4da6a0cc4ef9e18c43c2f86403822b72b 172.18.0.5:6379@16379 master - 0 1720880744092 7 connected 10923-16383
5fa7cec07512060397bcfda7bbb1cec73052a905 172.18.0.2:6379@16379 master - 0 1720880743000 1 connected 0-5460
b1c9fe739d6e8d0c519f98ac5e8ebcd1e52cfbe3 172.18.0.3:6379@16379 master,fail - 1720863593778 1720863590000 3 connected

Comment From: sundb

@patademahesh since the old master has been in the fail state, it will no longer receive messages from other nodes, so you need to manually delete it(cluster forget) and add it as a replica of the new master. i'm not expect with cluster, please figure me out if i'm wrong, thanks.

Comment From: patademahesh

Hi @sundb, the issue is all nodes except the newly elected master shows empty slots.