Hi!

I am interested in when calls to the sentinelFailoverSelectSlave function are init and when the sentinelSelectSlave function returns NULL?

void sentinelFailoverSelectSlave(sentinelRedisInstance *ri) {
    sentinelRedisInstance *slave = sentinelSelectSlave(ri);

    /* We don't handle the timeout in this state as the function aborts
     * the failover or go forward in the next state. */
    if (slave == NULL) {
        sentinelEvent(LL_WARNING,"-failover-abort-no-good-slave",ri,"%@");
        sentinelAbortFailover(ri);
    } else {
        sentinelEvent(LL_WARNING,"+selected-slave",slave,"%@");
        slave->flags |= SRI_PROMOTED;
        ri->promoted_slave = slave;
        ri->failover_state = SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;
        ri->failover_state_change_time = mstime();
        sentinelEvent(LL_NOTICE,"+failover-state-send-slaveof-noone",
            slave, "%@");
    }
}

In this case (1x Master (R1), 2x Slaves (R2, R3)) I get a -failover-abort-no-good-slave message in the log file:

  • R2 is DOWN
  • R3 is DOWN
  • R1 is DOWN
  • R3 UP as a Slave, but promoted process is falling (it's mean, that R3 is still Slave)

I have one more question that bothers me:

  • R1 has 192.168.10.10, R1 has 192.168.10.20 and R3 has 192.168.10.30
  • f647de705536775591595dfb543a739924ce4364 is RunID of R3

In above case I get in log file:

+new-epoch 5802
+try-failover master mymaster 192.168.10.10 6379
+vote-for-leader f647de705536775591595dfb543a739924ce4364 5802
ef58a52e53566fde8106b9112ea4b9689023e35e voted for f647de705536775591595dfb543a739924ce4364 5802
c8e2591af9d8437bdafd78ccdc6c5b9f618613d6 voted for f647de705536775591595dfb543a739924ce4364 5802
+elected-leader master mymaster 192.168.10.10 6379
+failover-state-select-slave master mymaster 192.168.10.10 6379
-failover-abort-no-good-slave master mymaster 192.168.10.10 6379
Next failover delay: I will not start a failover before Mon Sep 21 07:19:48 2020
  • all running Sentinels (3x) voted for R3

But why the following messages keep R1 (Master, is DOWN) address?

+elected-leader master mymaster 192.168.10.10 6379
+failover-state-select-slave master mymaster 192.168.10.10 6379
-failover-abort-no-good-slave master mymaster 192.168.10.10 6379

Shouldn't it be R3 (192.168.10.30)?

Thanks a lot.

Comment From: hwware

Hello @trimstray , the way how slaves has been filltered out is based on the folowing roles:

It filtered out all the slaves which is in S_DOWN(subective down), O_DOWN(objective down) state. It filtered out all slaves with disconnected link. It filtered out all slaves which doesn't get back to Sentinel ping in 5 secs by default. It filtered out all slaves with 0 priority. It filtered out slaves with more than 5 sec info_refresh time when master is in S_DOWN state. It filtered out slaves which has master link down time more than master down time+10*down_after_period.

These roles will be validated one by one after the lead sentinel was elected and if either of them is true for slave, it wil be removed from the candidate list.

regarding your second question, currently the sentinel log failover-state-select-slave and follows only shows the master information, since the failover is not finished, it wil still be old address and port, after the failover successfully finished, it will be replaced by new ip and port or promoted slave. thanks!

Comment From: trimstray

Hi @hwware,

Thanks a lot for your answers!

One more thing: are we able to control the time from points 3, 5 and 6? Or any of them? Is there a variable in redis-sentinel.conf for that?

Comment From: hwware

Hello @trimstray ,

the 6 is tunable, the down_after_period is the down-after-milliseconds configuration in sentinel conf. https://github.com/redis/redis/blob/1c71038540f8877adfd5eb2b6a6013a1a761bc6c/src/sentinel.c#L4214 the 3 and 5 currently is not tunable in sentinel conf, since it is using SENTINEL_PING_PERIOD defined value(1s), you may need to either change the source code or modify this value, but be careful about the side effect since the SENTINEL_PING_PERIOD is used in other logic and it may have side affect. https://github.com/redis/redis/blob/1c71038540f8877adfd5eb2b6a6013a1a761bc6c/src/sentinel.c#L4203 https://github.com/redis/redis/blob/1c71038540f8877adfd5eb2b6a6013a1a761bc6c/src/sentinel.c#L4213 thanks.

Comment From: trimstray

Hi @hwware!

Nice, I understand everything now.

I am grateful for your help!

Comment From: hwware

@trimstray my pleasure!

Comment From: hwware

@trimstray can you please close this issue? thanks!