Hello Experts,
My Redis version is 7.0.10, and I want to decrease the time consumption of failover time consumption in the sentinel-mode
Last time, the parameter sentinel down-after-milliseconds was 3000ms, the failover time cost is around 4.5s, and it's too long. After changing this parameter to 200ms, I think 200ms is a relatively small PING delay tolerance, however, it still costs around 1.5s.
Please help me if there are any ways to further improve and optimize this time cost.
These days after studying and analyzing the code file sentinel.c, I think there are maybe some optimized points:
sentinelTimerincreaseserver.hzduring failover
void sentinelTimer(void) {
sentinelCheckTiltCondition();
sentinelHandleDictOfRedisInstances(sentinel.masters);
sentinelRunPendingScripts();
sentinelCollectTerminatedScripts();
sentinelKillTimedoutScripts();
dictIterator *di = dictGetIterator(sentinel.masters);
dictEntry *de;
bool needFrequent = false;
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
if(ri && ri->flags & (SRI_O_DOWN | SRI_S_DOWN | SRI_FAILOVER_IN_PROGRESS)){
needFrequent = true;
break;
}
}
dictReleaseIterator(di);
if(needFrequent) {
server.hz = 100 + rand() % CONFIG_DEFAULT_HZ;
}else {
server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ;
}
}
sentinelSendPeriodicCommandssendinfomore frequently
if ((ri->flags & SRI_SLAVE) &&
((ri->master->flags & (SRI_O_DOWN|SRI_FAILOVER_IN_PROGRESS)) ||
(ri->master_link_down_time != 0)))
{
info_period = 10; // 1000 -> 10ms
} else {
info_period = sentinel_info_period;
}
This is the main optimization, and the result is decreased to around 500ms from 1.5s Please help me check whether this code change is OK. Looking forward for your expert opinion. Best regards to all of you.