I have a (maybe silly) question about the replication backlog. When a new empty slave joins and start replication from the master, the master would send a full sync snapshot to that slave, and meanwhile the backlog would store the new writes starting from the snapshot point, correct?
But what if the network status is too bad (or the snapshot is too big, e.g. 512GB, assuming the max memory is large) to download the whole snapshot before the backlog is rewind? Then that slave would never catch up the replication (it would try to re-create rdb and download and fail again, which seems a dead loop)?
Why redis does not build replication based on the AOF? Then it don't have to store new writes in backlog anymore, and ensure the slow slave could finish the full sync.
Comment From: trevor211
It is not the backlog which store the new writes starting from the snapshot point, it's the client output buffer. Backlog accumulates new writes all the time if enabled.
It is possible for such a big node, with used memory e.g. 512GB, to run into a dead loop as you metioned. Usually we don't expect to use such a big node, we could make it smaller, e.g. 10GB.
The reason that redis does not build replication based on the AOF is that AOF itself can be changed by AOF rewrite, which would make a different history.