Redis Replication: partial resync after link goes down then up.

This is an excerpt from a message I wrote in the mailing list explaining the possible solution for this problem. Basically now after master and slave connection goes down a full resync is needed. Redis is able to resync in a very short time but with big datasets and short link down times it is still not very good to perform a full resync, so in the future I want to fix this definitely. The following is a description of the solution I've in mind:

Excerpt from google group

if I understand correctly what you want is faster synchronization between master and slaves after the replication link is down? You propose your solution (that is useful in other context, for a possible unification of RDB and AOF in the future), but we are implementing a better replication resync with a different approach than the one proposed.

Basically when a slave is no longer connected the master will stil accumulate writes for him instead of freeing the client structure associated with the slave.

Then the slave will reconnect again, but will issue an optional "SYNC " command, specifying the absolute offset of all the master -> slave chat of the previous connection. If we have still the buffer in memory (this depends on the amount of time and on the amount of data written on the server while the connection is down, since after some time we have to gave up and free the client structure) we start a partial synchronization.

Also in order to do that we need to be sure that the slave instance chatting with us is the same as before. In order to do that we'll assign an unique Redis instance ID to every instance at every run. This instance ID will be communicated in the course of SYNC requests.

Some low level detail

In order to do what exposed above in the case of slave client structures the objects should not be deleted once sent to the client, but accumulated (up to a given limit) in a client list of objects already sent.

This, plus taking a per-client counter of the amount of bytes sent in total, will make it possible for us to rewind at the right point to restart our master->slave chat just after the latest command processed by the slave in the course of the previous connection.

Comment From: yoav-steinberg

I think this was implemented years ago: PSYNC. This can probably be closed.