I have been trying to setup redis in sentinel mode using docker-compose file. Below are the contents of my compose file -
version: '3.3'
services:
redis-master:
image: redis:latest
deploy:
replicas: 1
networks:
- Overlay_Network
redis-slave:
image: redis:latest
command: redis-server --slaveof redis-master 6379
depends_on:
- redis-master
deploy:
replicas: 2
networks:
- Overlay_Network
sentinel:
image: sentinel:latest
environment:
- SENTINEL_DOWN_AFTER=5000
- SENTINEL_FAILOVER=5000
- REDIS_MASTER=redis-master
depends_on:
- redis-master
- redis-slave
deploy:
replicas: 3
networks:
- Overlay_Network
networks:
Overlay_Network:
external:
name: Overlay_Network
Here I am creating three services redis-master, redis-slave and sentinel(local docker image used that starts redis in sentinel mode based on passed env variables). I followed this for creating sentinel image https://gitlab.ethz.ch/amiv/redis-cluster/tree/master
When I use docker-compose to run the services. It works fine.
docker-compose -f docker-compose.yml up -d
It starts all services with single instance of each. Later I manually scale redis-slave to 2 instances and sentinel to 3 instances. Then when I stop the container for redis-master, sentinel notices it and make one of slave node as master. It is working as expected.
The issue happens when I run it in swarm mode using docker stack deploy command using the same compose file.
docker stack deploy -c docker-compose.yml <stack-name>
It starts all the services, 1 instance for redis-master, 2 for redis-slave and 3 for sentinel. It uses overlay network. When I stop container for redis-master, sentinel could not upgrade any of slave nodes to master mode. Seems sentinel could not add and notice slave nodes. It adds and then it shows in down status. Here is snippet from sentinel log file.
1:X 04 Jul 2019 14:31:36.465 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 04 Jul 2019 14:31:36.465 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 04 Jul 2019 14:31:36.465 # Configuration loaded
1:X 04 Jul 2019 14:31:36.466 * Running mode=sentinel, port=26379.
1:X 04 Jul 2019 14:31:36.466 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 04 Jul 2019 14:31:36.468 # Sentinel ID is e84a635f6cf4c0ee4454922a557a7c0fba00fadd
1:X 04 Jul 2019 14:31:36.468 # +monitor master mymaster 10.0.22.123 6379 quorum 2
1:X 04 Jul 2019 14:31:36.469 * +slave slave 10.0.22.125:6379 10.0.22.125 6379 @ mymaster 10.0.22.123 6379
1:X 04 Jul 2019 14:31:38.423 * +sentinel sentinel f92b9499bff409558a2eb985ef949dfc7050c528 10.0.22.130 26379 @ mymaster 10.0.22.123 6379
1:X 04 Jul 2019 14:31:38.498 * +sentinel sentinel 6e32d6bfea4142a0bc77a74efdfd24424cbe026b 10.0.22.131 26379 @ mymaster 10.0.22.123 6379
1:X 04 Jul 2019 14:31:41.538 # +sdown slave 10.0.22.125:6379 10.0.22.125 6379 @ mymaster 10.0.22.123 6379
I thought it could be due to start order of containers. But depends_on field is not valid for stack mode and I could not find any other way to define the start order in stack mode.
When I do docker network inspect for overlay network, here is the output
"Containers": {
"57b7620ef75956464ce274e66e60c9cb5a9d8b79486c5b80016db4482126916b": {
"Name": "sws_sentinel.3.y8sdpj8609ilq22xinzykbxkm",
"EndpointID": "a95ab07b07c68a32227be3b5da4d378b82f24aab4279bfaa13899a2a7184ce09",
"MacAddress": "02:42:0a:00:16:84",
"IPv4Address": "10.0.22.132/24",
"IPv6Address": ""
},
"982222f1b87e1483ec791f382678ef02abcdffe74a5df13a0c0476f7f3a599a7": {
"Name": "sws_redis-slave.1.uxwkndhkdnizyicwulzli964r",
"EndpointID": "f5f8fa056622b1529351355c3760c3f45357c7b3de3fe4d2ee90e2d490328f2a",
"MacAddress": "02:42:0a:00:16:80",
"IPv4Address": "10.0.22.128/24",
"IPv6Address": ""
},
"c55376217215a1c11b62ac9d22d28eaa1bcda89484a0202b208e557feea4dd35": {
"Name": "sws_redis-slave.2.s8ha5xmvx6sue2pj6fav8bcbx",
"EndpointID": "6dcb13e23a8b4c0b49d7dc41e5813b317b8d67377ac30a476261108b8cdeb3f8",
"MacAddress": "02:42:0a:00:16:7f",
"IPv4Address": "10.0.22.127/24",
"IPv6Address": ""
},
"cd6d72547ef3fb34ece45ad0201555124505379182f7445373025e1b9a115554": {
"Name": "sws_redis-master.1.3rhfihzqip2a44xq2uerhqkjt",
"EndpointID": "9074f9c911e03de0f27e4fb6b75afdf6bb38a111a511738451feb5e64c8dbff3",
"MacAddress": "02:42:0a:00:16:7c",
"IPv4Address": "10.0.22.124/24",
"IPv6Address": ""
},
"lb-SA_Monitor_Overlay": {
"Name": "SA_Monitor_Overlay-endpoint",
"EndpointID": "2fb84ac75f5eee015b80b55713da83d1afb7dfa7ed4c1f5eda170f4b8daf8884",
"MacAddress": "02:42:0a:00:16:7d",
"IPv4Address": "10.0.22.125/24",
"IPv6Address": ""
}
}
Here I see slaves are running on ip 10.0.22.128 and 10.0.22.127, but in sentinel log file it is trying to add slave using ip 10.0.22.125. Why is that? Could this be an issue?
Let me know if any more detail is required.
Comment From: kuldeepsidhu88
Solution
I concluded that it was happening due to docker swarm default load balancer. Sentinel gets information about slaves from master node. But slaves are not getting registered with their actual IP address in docker network. It seems to be load balanced IP. So sentinel was not able to reach slaves using that IP and it shows slave is down.
They have also mentioned it on their documentation page
https://redis.io/topics/replication [Configuring replication in Docker and NAT]
https://redis.io/topics/sentinel [Sentinel, Docker, NAT, and possible issues]
As a solution to this, I made my custom Dockerfile to start redis-slave nodes. It uses redis.conf and an entrypoint.sh script. In entrypoint.sh I get the container's real IP and write it to redis.conf and as last step, start redis-server using that updated redis.conf.
slave-announce-ip <CONTAINER_IP_ADDRESS>
slave-announce-port 6379
You can also do similar steps for sentinel nodes.
Now slaves will be registered using their real conatiner IP address, port and sentinel is able to communicate with them.
Comment From: spacepirate0001
@kuldeepsidhu88 is it possible to share your file for reproduce-ability? Thanks
Comment From: PhilPhonic
@kuldeepsidhu88 could you please share your redis.conf and entrypoint.sh ?
Comment From: kuldeepsidhu88
@Haythamamin @PhilPhonic Please find files below for reference.
Dockerfile
FROM redis:5
COPY replica/redis.conf /etc/redis/redis.conf
RUN chown redis:redis /etc/redis/redis.conf
COPY replica/redis-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/redis-entrypoint.sh
EXPOSE 6379
ENTRYPOINT ["redis-entrypoint.sh"]
redis.conf
replicaof {{REDIS_MASTER}} 6379
replica-announce-ip {{REPLICA_CONTAINER_IP}}
replica-announce-port 6379
entrypoint.sh
#!/bin/bash
# get container id from /proc/self/cgroup
CONTAINER_ID_LONG=`cat /proc/self/cgroup | grep 'docker' | sed 's/^.*\///' | tail -n1`
# search for the id in /etc/hosts, it uses only first 12 characters
CONTAINER_ID_SHORT=${CONTAINER_ID_LONG:0:12}
DOCKER_CONTAINER_IP_LINE=`cat /etc/hosts | grep $CONTAINER_ID_SHORT`
# get the ip address
THIS_DOCKER_CONTAINER_IP=`(echo $DOCKER_CONTAINER_IP_LINE | grep -o '[0-9]\+[.][0-9]\+[.][0-9]\+[.][0-9]\+')`
# set as environment variable
export DOCKER_CONTAINER_IP=$THIS_DOCKER_CONTAINER_IP
# replace placeholders in redis.conf file with environment variables
sed -i 's,{{REDIS_MASTER}},'"${REDIS_MASTER}"',g' /etc/redis/redis.conf
sed -i 's,{{REPLICA_CONTAINER_IP}},'"${DOCKER_CONTAINER_IP}"',g' /etc/redis/redis.conf
# start redis
exec docker-entrypoint.sh redis-server /etc/redis/redis.conf
sentinel nodes can also be configured in similar way. Hope it helps. Let me know if you have any further questions.
Comment From: PhilPhonic
Thanks @kuldeepsidhu88 I have built an entrypoint in a similar way. Works in general, but sadly not in docker-swarm
Comment From: kuldeepsidhu88
@PhilPhonic Yes. Things get tricky in docker swarm. I expect Redis team should release some official documentation how to make things work in docker swarm environments.
Comment From: collabnix
version: '3'
services:
redis-master:
image: 'bitnami/redis:latest'
ports:
- '6379:6379'
environment:
- REDIS_REPLICATION_MODE=master
- REDIS_PASSWORD=laSQL2019
- REDIS_EXTRA_FLAGS=--maxmemory 100mb
volumes:
- 'redis-master-volume:/bitnami'
deploy:
mode: replicated
replicas: 2
redis-slave:
image: 'bitnami/redis:latest'
ports:
- '6379'
depends_on:
- redis-master
volumes:
- 'redis-slave-volume:/bitnami'
environment:
- REDIS_REPLICATION_MODE=slave
- REDIS_MASTER_HOST=redis-master
- REDIS_MASTER_PORT_NUMBER=6379
- REDIS_MASTER_PASSWORD=laSQL2019
- REDIS_PASSWORD=laSQL2019
- REDIS_EXTRA_FLAGS=--maxmemory 100mb
deploy:
mode: replicated
replicas: 2
redis-sentinel:
image: 'bitnami/redis:latest'
ports:
- '16379:16379'
depends_on:
- redis-master
volumes:
- 'redis-sentinel-volume:/bitnami'
entrypoint: |
bash -c 'bash -s <<EOF
"/bin/bash" -c "cat <<EOF > /opt/bitnami/redis/etc/sentinel.conf
port 16379
dir /tmp
sentinel monitor master-node redis-master 6379 2
sentinel down-after-milliseconds master-node 5000
sentinel parallel-syncs master-node 1
sentinel failover-timeout master-node 5000
sentinel auth-pass master-node laSQL2019
EOF"
"/bin/bash" -c "redis-sentinel /opt/bitnami/redis/etc/sentinel.conf"
EOF'
deploy:
mode: replicated
replicas: 3
volumes:
redis-master-volume:
driver: local
redis-slave-volume:
driver: local
redis-sentinel-volume:
driver: local
Comment From: jaschaio
@collabnix this is amazing, not sure why it is so deeply hidden in a github issue as it is the only configuration of redis sentinel on docker swarm that actually seems to work.
Anyway, quick question as I am struggeling to adapt your entrypoint script using a docker secret for the password instead of just writing it in plain text.
Assuming that I have a docker secret for the password mounted at /run/secrets/password I guess I need to export it into a environment variable via export PASSWORD="$("</run/secrets/password")" and then using it within your entrypoint script.
Here is my attempt that doesn't work:
entrypoint: |
bash -c 'bash -s <<EOF
"/bin/bash" -c "export PASSWORD=$$(</run/secrets/password) && \
echo $PASSWORD && \
cat <<EOF > /opt/bitnami/redis/etc/sentinel.conf
port 16379
dir /tmp
sentinel monitor master-node master 6379 2
sentinel down-after-milliseconds master-node 5000
sentinel parallel-syncs master-node 1
sentinel failover-timeout master-node 5000
sentinel auth-pass master-node $PASSWORD
EOF"
"/bin/bash" -c "redis-sentinel /opt/bitnami/redis/etc/sentinel.conf"
EOF'
Maybe you got an idea as you seem to be more experienced with writing bash scripts.
Comment From: hedleyroos
I had to add this line to the sentinel conf file to get it to work:
sentinel resolve-hostnames yes
Comment From: adshin21
@hedleyroos did you check, if you scale down the master to 0, is sentinel able to elect the new master from the slaves? For me, it is just saying "can't resolve hostname redis-master".
Comment From: macrokernel
@adshin21, Have you tried enabling Redis data persistence? I am not using the solution provided by @collabnix, but in my deployment with data persistence enabled master election is going well after scale down and restore of all Redis server instances.
Comment From: eazylaykzy
@macrokernel, Do you mind sharing your setup, I'm currently faced with the same problem as @adshin21
Comment From: macrokernel
@eazylaykzy, Sure, please check my repo: https://github.com/macrokernel/redis-ha-cluster.
Comment From: Luk7c
Hi,
I used @adshin21's docker-compose (but I have only 1 slave), and I used appendonly true to my sentinel.conf
But I'm facing a problem : I pause my master and my slave is now the new master -> OK I unpause my old master and my old master is synchronised with redis-slave (which is my current master) -> OK I pause redis-slave, and redis-master cannot be promoted new master -> KO
I'm using Swarm to deploy my docker-compose
Here are the logs from my master :
Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
Here is my sentinel.conf :
# Generated by CONFIG REWRITE
sentinel monitor mymaster <ip redis-master> 6379 2
sentinel known-replica mymaster <ip redis-slave> 6379
sentinel known-replica mymaster redis-master 6379
I don't understand why I have sentinel known-replica mymaster redis-master 6379
Comment From: jganeshpai1994
Hello Everyone,
Have found the solution for this issue.
In Docker Swarm environment ,the IPs will change once the container is down and recreated , this causes issues in the sentinel has it adds the older ips as well as the new ips.
In bitnami/redis have found another issue when sentinel is intialized the sentinel container does not have ips of the replicas and also other sentinel , this causes sentinel to go in tilt mode when the election happens.
To avoid all the above issue have created a shell script with following check
#!/bin/bash
echo "Sleeping 20 seconds before running checks"
sleep 20
while true; do
replica_count=$(cat /opt/bitnami/redis/etc/sentinel.conf | grep -o 'known-replica' | wc -l)
sentinel_count=$(cat /opt/bitnami/redis/etc/sentinel.conf | grep -o 'known-sentinel' | wc -l)
echo "Replica Count : $replica_count"
echo "Sentinel Count : $sentinel_count"
echo "=====Check Replica Count===="
if [ "$replica_count" -gt 3 ]; then
echo "=========== Before sentinel.conf (start)========"
cat /opt/bitnami/redis/etc/sentinel.conf
echo "=========== Before sentinel.conf (end) ========"
redis-cli -p 16379 SENTINEL RESET master-node
redis-cli -p 16379 SENTINEL FAILOVER master-node
redis-cli -p 16379 SENTINEL RESET master-node
echo "Reset done Sentinel"
echo "=========== After sentinel.conf (start)========"
cat /opt/bitnami/redis/etc/sentinel.conf
echo "=========== After sentinel.conf (end) ========"
fi
# Check if sentinel has no replica
if [ "$replica_count" -eq 0 ]; then
echo "Zero Replica Count"
redis-cli -p 16379 SHUTDOWN
fi
echo "=====Check Sentinel Count===="
if [ "$sentinel_count" -lt 2 ]; then
echo "=========== Before sentinel.conf (start)========"
cat /opt/bitnami/redis/etc/sentinel.conf
echo "=========== Before sentinel.conf (end) ========"
redis-cli -p 16379 SENTINEL FAILOVER master-node
echo "Failing over Sentinel"
echo "=========== After sentinel.conf (start)========"
cat /opt/bitnami/redis/etc/sentinel.conf
echo "=========== After sentinel.conf (end) ========"
elif [ "$sentinel_count" -gt 2 ];then
echo "Reseting..."
redis-cli -p 16379 SENTINEL RESET master-node
fi
sleep 10
done
```
The following shell script is run in sentinel container and checks for replica count and sentinel count . In my case I have 1 redis master and 2 slaves that's why the first condition should not be greater than 3
if it is greater than 3 , that means we have older IPs which are not present now so it resets and does failover. The Reset over here will tell sentinel to get the latest IPs of the master and replicas and do a failover to avoid any older IPs selected as master
The second condition is replica count equal to zero which means sentinel was not initialized correctly and needs to be restarted so that check has been added
The third condition is for sentinel count similarly I have 3 sentinel if count greater than 2 than reset and do failover.
Below is my stack yml
version: '3.7'
services:
redis-commander:
image: ghcr.io/joeferner/redis-commander:latest
ports:
- "8081:8081"
environment:
- SENTINEL_HOST=redis-sentinel:16379
- SENTINEL_NAME=master-node
networks:
- overlay_net
deploy:
mode: replicated
replicas: 1
redis-master:
image: bitnami/redis:6.2.13
environment:
- REDIS_REPLICATION_MODE=master
- ALLOW_EMPTY_PASSWORD=yes
- REDIS_EXTRA_FLAGS=--maxmemory 100mb
- REDIS_SENTINEL_MASTER_NAME=master-node
- REDIS_SENTINEL_HOST=redis-sentinel
- REDIS_SENTINEL_PORT_NUMBER=16379
volumes:
- ./metadata_cache:/bitnami/redis/data
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.labels.node_name == node-1
command: /opt/bitnami/scripts/redis/run.sh --min-replicas-to-write 1 --min-replicas-max-lag 10
networks:
- overlay_net
redis-slave1:
image: bitnami/redis:6.2.13
depends_on:
- redis-master
volumes:
- ./metadata_cache:/bitnami/redis/data
environment:
- REDIS_REPLICATION_MODE=slave
- REDIS_MASTER_HOST=redis-master
- ALLOW_EMPTY_PASSWORD=yes
- REDIS_MASTER_PORT_NUMBER=6379
- REDIS_EXTRA_FLAGS=--maxmemory 100mb
- REDIS_SENTINEL_MASTER_NAME=master-node
- REDIS_SENTINEL_HOST=redis-sentinel
- REDIS_SENTINEL_PORT_NUMBER=16379
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.labels.node_name == node-2
command: /opt/bitnami/scripts/redis/run.sh --min-replicas-to-write 1 --min-replicas-max-lag 10
networks:
- overlay_net
redis-slave2:
image: bitnami/redis:6.2.13
depends_on:
- redis-master
volumes:
- ./metadata_cache:/bitnami/redis/data
environment:
- REDIS_REPLICATION_MODE=slave
- REDIS_MASTER_HOST=redis-master
- ALLOW_EMPTY_PASSWORD=yes
- REDIS_MASTER_PORT_NUMBER=6379
- REDIS_EXTRA_FLAGS=--maxmemory 100mb
- REDIS_SENTINEL_MASTER_NAME=master-node
- REDIS_SENTINEL_HOST=redis-sentinel
- REDIS_SENTINEL_PORT_NUMBER=16379
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.labels.node_name == node-3
command: /opt/bitnami/scripts/redis/run.sh --min-replicas-to-write 1 --min-replicas-max-lag 10
networks:
- overlay_net
redis-sentinel:
image: bitnami/redis:6.2.13
depends_on:
- redis-master
configs:
- source: sentinel_check.sh
target: /opt/bitnami/redis/sentinel_check.sh
mode: 0755
entrypoint: |
bash -c 'bash -s <<EOF
"/bin/bash" -c "cat <
You can the shell script is added as config and run as a background process in sentinel container. The overlay_net is overlay network created externally and also have redis-commander where you can check the data.
The Redis master and slaves are deployed on swarm cluster with labels as we are using 3 VM instances ,you can remove those labels or add your labels as required
So when the master is down you will see election happening. One more thing is in sentinel you will get the error "Unable to resolve redis-master" but don't worry this is warning as In Docker it will try to resolve redis-master hostname.
On the client side we are doing retires if failure occurs on sentinel