I'm having difficulties getting redis to work in a Docker Swarm setup. At first it works, but after a while (probably after a service restart), I'm getting these errors:
03 Mar 2020 13:41:09.748 * Connecting to MASTER redis-master:6379
03 Mar 2020 13:41:09.749 * MASTER <-> REPLICA sync started
03 Mar 2020 13:41:09.749 # Error condition on socket for SYNC: Connection refused
03 Mar 2020 13:41:10.751 * Connecting to MASTER redis-master:6379
…
These go on forever.
I'm using a docker-compose stack. I have 1 master, and replicas on each of the 3 servers. The setup doesn't use sentinels. My thinking is that if the master fails, docker restarts the service and it reads the config back in via the shared volume that is used by masters and replicas. Relevant parts:
redis-master:
image: "${CI_REGISTRY_IMAGE}:redis_master-${CI_COMMIT_REF_SLUG}"
networks:
- mynetwork
volumes:
- redis:/opt/scripts
ports:
- 6379:6379
command: sh -c 'redis-server /usr/local/etc/redis/redis.conf --bind $$(hostname -i)'
deploy:
replicas: 1
update_config:
parallelism: 1
delay: 10s
order: stop-first
failure_action: rollback
rollback_config:
parallelism: 1
delay: 10s
order: stop-first
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 5
window: 180s
healthcheck:
test: /usr/local/bin/healthcheck.sh
interval: 30s
timeout: 10s
retries: 3
start_period: 1m
redis-replica:
image: "${CI_REGISTRY_IMAGE}:redis_replica-${CI_COMMIT_REF_SLUG}"
networks:
- mynetwork
volumes:
- redis:/opt/scripts
ports:
- 6380:6380
command: sh -c 'redis-server /usr/local/etc/redis/redis.conf --bind $$(hostname -i) --replica-announce-ip $$(hostname -i) --port 6380 --replicaof redis-master 6379'
depends_on:
- redis-master
deploy:
mode: global
update_config:
parallelism: 1
delay: 10s
order: stop-first
failure_action: rollback
rollback_config:
parallelism: 1
delay: 10s
order: stop-first
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 5
window: 180s
healthcheck:
test: /usr/local/bin/healthcheck.sh
interval: 30s
timeout: 10s
retries: 3
start_period: 1m
I have tested this setup and it seems to work. I've also tested rebooting each of the 3 servers, and after a while Redis connects to all of the instances fine, so they seem to find each other.
But, after a while this breaks. I don't know exactly when, by the time I notice it, my log is full of reconnecting messages.
If I bind to 0.0.0.0, everything seems to go well (at least for longer periods of time), but this puts my database wide open, so that's not feasible. I have a feeling the problem has something to do with the binding, or a restart of the service gets a new IP or something, I don't know.
Any help much appreciated!
Comment From: Monokai
Update. I now did change the bind to 0.0.0.0, and used expose: 6379 instead of ports: "6379:6379" to expose the port to other services in the overlay network without mapping the port of the container to the host.
Again, everything looks OK and after a while I'm now getting:
Mar 2020 10:52:21.924 # Unable to connect to MASTER: Resource temporarily unavailable
Mar 2020 10:52:22.927 * Connecting to MASTER redis-master:6379
Mar 2020 10:52:22.928 # Unable to connect to MASTER: Resource temporarily unavailable
Mar 2020 10:52:23.931 * Connecting to MASTER redis-master:6379
…
It repeats every second
Comment From: Monokai
Update. This might have been an out-of-memory error. I've upgraded the server to include more memory and I haven't got any issues since two weeks.
I would like to hear if this Docker Swarm setup is the right way to go or if I need to put Redis in cluster mode or something. It's all a bit vague to me how a Redis master / replica setup ideally should be used in a Docker Swarm setup. I only use Redis for caching purposes and it doesn't matter all that much if some data is lost due to restarting a Redis instance.