Crash report

*** FATAL CONFIG FILE ERROR (Redis 6.0.8) ***
Reading the configuration file, at line 17
>>> 'sentinel known-replica redisProd 10.0.0.1 6389'
Wrong hostname or port for replica.

Aditional information

  1. Official Redis docker image 6.0.8

  2. Steps to reproduce

I have a 3 hosts setup with each of them running redis and sentinel as containers. Basically example no.2 from the official docs.

This is the config (read from the startup of the sentinel) from host1. Similar config for the other two sentinels.

=== Redis-sentinel configuration ===
port 26389
dir /tmp
sentinel monitor redisProd host1 6389 2
sentinel down-after-milliseconds redisProd 5000
sentinel parallel-syncs redisProd 1
sentinel failover-timeout redisProd 5000
sentinel announce-ip host1
sentinel announce-port 26389
logfile "/logs/redis-sentinel.log"
loglevel debug
maxmemory 16MB
maxmemory-policy noeviction
=== !Redis-sentinel configuration ===

Problem When I start the sentinels, all is well, but after working for day or two, the sentinels start crashing and throwing this error:

*** FATAL CONFIG FILE ERROR (Redis 6.0.8) ***
Reading the configuration file, at line 17
>>> 'sentinel known-replica redisProd 10.0.0.1 6389'
Wrong hostname or port for replica.

I have noticed that I have the same entry twice (I am going to write down the line numbers at the beginning of the line):

[15] sentinel leader-epoch redisProd 0
[16] sentinel known-replica redisProd 10.0.0.2 6389
[17] sentinel known-replica redisProd 10.0.0.2 6389
[18] sentinel known-replica redisProd 10.0.0.3 6389
[19] sentinel known-replica redisProd 10.0.0.3 6389
[20] sentinel known-sentinel redisProd 10.0.0.2 26389 6c8a91568c5468bcbb4d52b24b6
[21] sentinel known-sentinel redisProd 10.0.0.3 26389 270184c487a8b2f4cc495611475

I don't have any ideas how is this happening or if it is the actual reason. Any help will be appraciated.

Comment From: patpatbear

I'm guessing you are starting redis-sentinel with redis-server command. try redis-sentinel redis.conf

Comment From: ivanovaleksandar

I am indeed starting it with redis-server --sentinel. Will try this and report back.

Comment From: hwware

Hello @ivanovaleksandar , thank you for reporting this, this is a config error, not specifically a server crash. the reason of this issue was caused since you have duplicate replica entry with the same ip and port, the sentinel will use ip and port to identify each replica. and further if it contains duplicate it cannot add into master->replicas dict(please see https://github.com/redis/redis/blob/9dcdc7e79a25968fcdfde09c7ca72a2012a1febf/src/sentinel.c#L1224), thus will return error that you shows above.. you can simplely remove the duplicate entry of sentinel known-replica line under same master and this issue should be resolved. Thanks

Comment From: hwware

Also may i ask you do you know how the duplicate config entries was there? is that because you put it by mistake or you did not do that by yourself? thanks!

Comment From: ivanovaleksandar

Hi, I tested @patpatbear suggestion, but after 3 days of running, the same issue happened.

@hwware When I start the service, every component (server or sentinel) is set only once, but after a while, the sentinel restarts and registers the same redis instances twice, hence the known-replica duplicated entries. I don't know how that happens and what triggers it, because it is running for couple of days correctly. Even if I restart a random sentinel or redis instance, a new redis master is assigned and all the sentinels are working as intended.

Any suggestions on how can I troubleshoot this, so I can help gather more info and report it?

EDIT: I just saw the PR. I will try and build + deploy it.

Comment From: yossigo

@ivanovaleksandar Do you use replica-announce-ip with a hostname on the Redis instances?

Comment From: ivanovaleksandar

@yossigo TL;DR I tried both approaches (hostnames and IPs only) but ended up with the same result.

Here is the config. I modified the docker entrypoint so I can dump the config. (Note: the IPs/Names redacted/changed)

=== Redis-sentinel configuration ===
port 26389
dir /tmp
sentinel monitor redisProd 10.0.0.1 6389 2
sentinel down-after-milliseconds redisProd 5000
sentinel parallel-syncs redisProd 1
sentinel failover-timeout redisProd 5000
sentinel announce-ip 10.0.0.1
sentinel announce-port 26389
logfile "/logs/redis-sentinel.log"
loglevel debug
maxmemory 16MB
maxmemory-policy noeviction
=== !Redis-sentinel configuration ===

After the sentinels fails, the config that it dumps is the following:

=== Redis-sentinel configuration ===
=== Redis-sentinel configuration ===
port 26389
dir "/tmp"
sentinel myid 57366eeeccb93d0ff50f5fe5341964b0a29b91a2
sentinel deny-scripts-reconfig yes
sentinel monitor redisProd 10.0.0.3 6389 2
sentinel down-after-milliseconds redisProd 5000
sentinel failover-timeout redisProd 5000
sentinel config-epoch redisProd 0
logfile "/logs/redis-sentinel.log"
loglevel debug
maxmemory 16mb
maxmemory-policy noeviction
user default on nopass ~* +@all
sentinel leader-epoch redisProd 0
sentinel known-replica redisProd 10.0.0.2 6389
sentinel known-replica redisProd 10.0.0.1 6389
sentinel known-replica redisProd 10.0.0.1 6389
sentinel known-replica redisProd 10.0.0.2 6389
sentinel known-sentinel redisProd 10.0.0.2 0 2b27038be2a5f37d3c333350a7ae5f1c87d7f87a
sentinel known-sentinel redisProd 10.0.0.2 26389 f5f57bfac67df38cf2b1f10eae44ec1390756258
sentinel known-sentinel redisProd 10.0.0.3 26389 a5f8f65f9237afab419e91fd23175296f2a1259b
sentinel current-epoch 0
sentinel announce-ip "10.0.0.1"
sentinel announce-port 26389
=== !Redis-sentinel configuration ===

*** FATAL CONFIG FILE ERROR (Redis 6.0.8) ***
Reading the configuration file, at line 18
>>> 'sentinel known-replica redisProd 10.0.0.1 6389'
Wrong hostname or port for replica.

Comment From: yossigo

@ivanovaleksandar Just to be sure I was clear, when referring to IPs I was referring to the Redis configuration and not the Sentinel configuration (there's a known issue with it, that's why I'm asking).

Another thing that could help identify the problem here is looking at the Sentinel logs and watch the chain of events from startup up to the point the wrong configuration file gets written.

Comment From: ivanovaleksandar

@yossigo Indeed the config has hostnames instead of IPs. Initially, the sentinel config was set with hostnames only, but the result was the same.

Now, I am testing again with all config set with IPs instead of hostnames for both redis and sentinel. Once I have relevant feedback, I will report back.

Comment From: ivanovaleksandar

After deploying the only IPs setup, I am not able to replicate the issue. Now I am questioning if this issue occurs with docker only setup, or a general bug.

Anyways, @patpatbear @hwware @yossigo thank you for the help. This can be considered as resolved.

I will close this for now, but I would like to link the "known issue" as well.

Comment From: ivanovaleksandar

The issue re-appeared after a week again. The logs are set to loglevel debug, but I am not getting any substantiat info out of it. Just a dump from the config lines (which I already posted before).

Comment From: yossigo

@ivanovaleksandar Do you see any spurious +slave lines in the logs?

Comment From: ivanovaleksandar

No, logs containing +slave.

Comment From: yossigo

@ivanovaleksandar Does the most recent case include a duplicate known-replica, known-sentinel or both? Do you have a corrupted config example?

Comment From: ivanovaleksandar

I have this in the docker-entrypoint to make it easier for troubleshooting:

echo "=== Redis-sentinel configuration ==="
cat /etc/redis/redis-sentinel.conf | grep -v "^#" | grep -v '^$'
echo "=== !Redis-sentinel configuration ==="

And the results that I am getting:

=== Redis-sentinel configuration ===
port 26389
dir "/tmp"
sentinel myid f5f57bfac67df38cf2b1f10eae44ec1390756258
sentinel deny-scripts-reconfig yes
sentinel monitor redisProd 10.0.0.3 6389 2
sentinel down-after-milliseconds redisProd 5000
sentinel failover-timeout redisProd 5000
sentinel config-epoch redisProd 0
logfile "/logs/redis-sentinel.log"
loglevel debug
maxmemory 16mb
maxmemory-policy noeviction
user default on nopass ~* +@all
sentinel leader-epoch redisProd 0
sentinel known-replica redisProd 10.0.0.1 6389
sentinel known-replica redisProd 10.0.0.2 6389
sentinel known-replica redisProd 10.0.0.2 6389
sentinel known-replica redisProd 10.0.0.1 6389
sentinel known-sentinel redisProd 10.0.0.3 26389 6c002234ed9971639acb49f31c7bbe69ca84abdb
sentinel known-sentinel redisProd 10.0.0.1 26389 94a1a19cb40f410d72d0a1ec3ddc45b395486e9b
sentinel known-sentinel redisProd 10.0.0.1 0 57366eeeccb93d0ff50f5fe5341964b0a29b91a2
sentinel known-sentinel redisProd 10.0.0.3 0 a5f8f65f9237afab419e91fd23175296f2a1259b
sentinel current-epoch 0
sentinel announce-ip "10.0.0.2"
sentinel announce-port 26389
=== !Redis-sentinel configuration ===

*** FATAL CONFIG FILE ERROR (Redis 6.0.8) ***
Reading the configuration file, at line 18
>>> 'sentinel known-replica redisProd 10.0.0.2 6389'
Wrong hostname or port for replica.

I have the same exact output in the log output file, except the error message.