There are maybe 3 main reasons to choose one over the other. 1. the diskless master sends a slightly different format to the replica (using the EOF marker), so in the past we couldn't be sure if the replica can read that format. 2. if the disk is faster than the network, we'll get less COW in disk-based, and if the disk is slower than the network, we'll get less COW in diskless. 3. if we wanna use the created RDB for persistence purposes too, i.e. use the opportunity that we have to generate an RDB file for the replica, and update the server.lastsave time to postpone the next BGSAVE.

  1. since this was introduced we added the REPLCONF capa, so we now know if the replica is capable or not.
  2. i think these days the majority of the cases are when the replica and master are on the same fast network, and the disks are slower than the network.
  3. in normal operation, a replica doing re-sync is rare (compared to the periodic save), so we may wanna dismiss this concern. however note the pathological case described in #9991.

Comment From: oranagra

@redis/core-team please comment

Comment From: madolson

I think we should do this. ElastiCache has used this as the default for like 4-5 years now? (We merged it before it was merged upstream) and we have seen nothing but faster and more reliable syncs. One of the reasons is that is allows some workloads to run fully without disk, since not everyone saves rdb files.

Comment From: soloestoy

I think it's OK.

But don't change repl-diskless-load, since it may consume double memory in replica side.

Comment From: yossigo

I think this makes sense.

BTW strictly speaking "disks are slower than the network" is probably not correct (typical NVMe will beat a typical Ethernet), but a typical workload is probably running on a VM with virtualized network-attached storage so it's effectively bound by network speed.

Comment From: itamarhaber

Makes total sense to me

Comment From: oranagra

There's another side to that story. diskless master depends on repl-diskless-sync-delay, which is currently default to 5 seconds. this means that for simple small databases (which can take less than a second to replicate) we add 5 seconds delay. setting it shorter, will of course mean that if several replicas are connected together, each will create it's own fork.

i've attempted to run the test suit with repl-diskless-sync yes, and repl-diskless-sync-delay 0, the tests actually took longer (5:28 vs 5:22) because some tests with multiple replicas used separate forks.

Comment From: yossigo

@oranagra I don't think the test suite is an indicator of a real world production workload. In reality, where replicas don't continuously re-sync, I don't think the extra 5 seconds will matter. In fact, I think big datasets that take a long time to bgsave and thus open up a bigger window for replicas to join might be more of an issue.

Maybe we should add a note, recommending repl-diskless-sync default to be re-considered in deployments that involve multiple replicas?

Stretching this further, another option could be auto which will default to yes but fall back to no if we encounter multiple concurrent replicas. But that feels too much of an over-engineering effort.

Comment From: oranagra

@yossigo i know the test suite is no indication of anything, just stating my observation.

so you suggest to: 1. change the default 2. add a note to tell people they need to consider reverting that.

Comment From: oranagra

discussed this in a core-team meeting again. concluded we wanna proceed with changing the default, and that these 5 seconds shouldn't be a problem for most. since there's a config, anyone who suffers from it can set it back to no