Redis sentinel add loading-as-pong option to shorter the unavailable time after master restart

redis version:

4.0.2

problem:

In big-dataset scene, when the master restart in short time, sentinel think the master is available because master give "LOADING" as "PING"'s reply. the failover will not trigger because sentinel detect the restart of master, this will make service unavailable utill master finish the loading.

reproduce:

3 sentinels. 1 master + 1 slave, enable aof. client get the master info by sentinel.
kill -9 redis master, then start the redis master.
sentinel log: 1091433:X 13 Dec 20:03:42.712 * +reboot master mymaster 100.64.0.25 6379
client get stuck by "Redis is loading the dataset in memory" because failover don't trigger.

In my test of 10GB dataset, client wait almost 200s during master restart. client code: https://github.com/scenbuffalo/redis/blob/master/utils/cli_redis_stl.py

## suggestion add loading-as-pong option, see https://github.com/scenbuffalo/redis/commit/840df3c9dc88de671ae294f4942ec5fd62c456b7 In my test of 10GB dataset, the unavailable time will cut to down-after-milliseconds+2s if set loading-as-pong with no. If it is necessary, I will make a PR. @antirez

Comment From: perlun

Any updates on this one @scenbuffalo? I just ran into slightly similar exceptions when connecting to a Redis Sentinel cluster via the lettuce driver for Java:

io.lettuce.core.RedisLoadingException: LOADING Redis is loading the dataset in memory
    at io.lettuce.core.ExceptionFactory.createExecutionException(ExceptionFactory.java:132)
    at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:128)
    at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:69)
    at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
    at com.sun.proxy.$Proxy54.set(Unknown Source)

I agree that it would be worth considering if Redis should wait on listening to its client port until it's done loading the data. I haven't looked at the implementation though, so cannot really judge as to whether this is something that would be "easy" or "very challenging" to get implemented. @antirez?

Comment From: antirez

Hi @perlun, after many years of practice we still believe that it's far better to let the client fail ASAP if Redis is not available. To turn this behavior into a blocking one is simple if really needed (just try in a loop or alike -- anyway the loading happens in the course of many seconds or more sometimes, so to wait 100 ms, 200 ms, 400 ms, ... and PING is not terrible). On the other hand the majority of applications want to fail immediately in order to communicate the failure upstream, and show an error page or alike, re-check the local discovery system to check for server updates, and so forth.

Comment From: perlun

Thanks for the reply @antirez. I understand the reasoning here, and we'll add more retrying support at our end. We already have it in some parts of the system, but we'll have to add more now that we are adding support for connecting to Redis Sentinel clusters.

How about this problem that @scenbuffalo is describing though, isn't this an issue? Wouldn't it be better to optionally let the failover start under such circumstances (to let some of the slave nodes become master)? Or are there any major disadvantages to this that you perceive? One thing I can think of myself: if the master does have data that hasn't been propagated to the slaves yet, letting the master be unavailable for some time (hopefully not more than a few seconds) can in some circumstances be better from a data integrity standpoint than to let the node fail.

In big-dataset scene, when the master restart in short time, sentinel think the master is available because master give "LOADING" as "PING"'s reply. the failover will not trigger because sentinel detect the restart of master, this will make service unavailable utill master finish the loading.

Comment From: yaxing

Hi, we recently hit the same issue where long loading time (~5 mins loading 20GB) causing master not available for writes. can we seriously consider @perlun 's suggestion below?

How about this problem that @scenbuffalo is describing though, isn't this an issue? Wouldn't it be better to optionally let the failover start under such circumstances (to let some of the slave nodes become master)?

Regarding the concern here:

Or are there any major disadvantages to this that you perceive? One thing I can think of myself: if the master does have data that hasn't been propagated to the slaves yet, letting the master be unavailable for some time (hopefully not more than a few seconds) can in some circumstances be better from a data integrity standpoint than to let the node fail.

Failing over to a slave might make data more durable than relying on master reboot. As max slave replication lag we've been observing is 1 - 2 sec. and if not using fsync every query ( I believe very few use cases are using it), chances are either AOF or RDB could have higher or equal chance of data loss than replication. so we should probably just failover and at least have higher write availability.

Comment From: shanezhiu

I got the same issue with @yaxing . I agree with @scenbuffalo .

Comment From: eduardobr

I've been thinking why can't we have a mode where the replica keeps serving stale data while the Full Sync is happening in background. That could cost twice the memory and maybe disk space usage for a short period, but I think it definitely worth it and would get rid the LOADING status in most situations. In my use case where there's no need for sharding (so no Redis Cluster) and where it's ok to have master down for about a minute and replicas serving stale data, it would be very useful. Sometimes a redis standalone master with a few replicas can be a solid setup, if we solve this kind of issues. That of course makes things more Kubernetes friendly without needing to pay for Redis Operator.

@yaxing In the event of crashes, depending on how critical, you may not have any replica to tell the story, so that's a case where it's important to not rely on replication to reduce data loss, but using AOF persistence instead. Also considering master is the only point of pressure when there's fsync=always, it's fine to use it and replicas can live happily with only RDB persistence to allow PSYNC when they restart. That on a Redis standalone setup of course.

Comment From: daniel-house

At the root this issue is the same as https://github.com/redis/redis/issues/1297

Comment From: yossigo

Duplicate of #9438