Redis Docker swarm, Redis repeated cluster initialisation bricks cluster

I'm trying to run a redis cluster using docker swarm. I have 2 components to do this, the redis nodes and a initialization script that runs when the stack is created:

  # Redis custer deploy
  redis-deploy:
    image: redis:7.0.2-alpine
    entrypoint: >
      /bin/sh -c "
        redis-cli \\
          --cluster \\
          create \\
          $$(let i=1; while [ \"$$i\" -le ${REDIS_REPLICAS} ]; do echo redis-$$i:6379; let i=i+1; done) \\
          --cluster-yes \\
          --cluster-replicas ${REDIS_REPLICATION_FACTOR}
      "
    depends_on:
      - redis
    deploy:
      restart_policy:
        condition: on-failure

  # Redis cluster
  redis:
    image: redis:7.0.2-alpine
    hostname: redis-{{.Task.Slot}}
    command: >
      --port 6379
      --cluster-enabled yes
      --cluster-config-file master.conf
      --cluster-node-timeout 5000
      --appendonly yes
    volumes:
      - redis-data:/data
    restart: unless-stopped
    deploy:
      replicas: ${REDIS_REPLICAS}
      resources:
        limits:
          cpus: '0.1'
          memory: 100M

This setup works well when building the stack for the first time. But after tearing down and rebuilding the stack the setup script seems to brick the cluster.

script:

redis-deploy.1.qa9efr3zedev@desktop    | [ERR] Node redis-1:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.
redis-deploy.1.lmxvu66vblor@desktop    | [ERR] Node redis-1:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.
redis-deploy.1.mfva73tfarcz@desktop    | [ERR] Node redis-1:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.
redis-deploy.1.u9mr85wzq0xv@desktop    | [ERR] Node redis-1:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

cluster:

redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:21.956 # Error condition on socket for SYNC: Connection refused
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:22.958 * Connecting to MASTER 10.0.1.4:6379
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:22.958 * MASTER <-> REPLICA sync started
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:22.958 # Error condition on socket for SYNC: Connection refused
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:23.961 * Connecting to MASTER 10.0.1.4:6379
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:23.961 * MASTER <-> REPLICA sync started
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:23.961 # Error condition on socket for SYNC: Connection refused
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:24.963 * Connecting to MASTER 10.0.1.4:6379
redis.4.7c0v2uedes7k@desktop    | 1:S 25 Jun 2022 13:10:24.963 * MASTER <-> REPLICA sync started

What's the best way to deal with this? And should the cluster be bricked simply by running a initialization script twice??

Comment From: filipecosta90

@tamis-laan notice that you preserve the state of the cluster on the volume. I suggest you check the cluster state on the script and if all slots are assigned there is no need to create it. You can check cluster_state:

cluster_state: State is ok if the node is able to receive queries. fail if there is at least one hash slot which is unbound (no node associated), in error state (node serving it is flagged with FAIL flag), or if the majority of masters can't be reached by this node.

here's the relevant documentation: https://redis.io/commands/cluster-info/

Comment From: tamis-laan

something like this: redis-cli --cluster call redis-1:6379 CLUSTER INFO | grep cluster_state:ok | wc -l

I could script this out but I believe I should not have to.

It would be good practice if Redis would first check if all nodes are online and if they are not part of an existing cluster instead of just going ahead and bricking my cluster.

Comment From: madolson

It is odd that the cli create doesn't do any validation before updating state. I agree that you should probably do a check that the cluster is up before doing the create, but we will take a look to see if there is any way we can harden the tooling.