We had typical problem of going replication sync in loop. At point in time we had buffer client output buffer full message on master on one of the replica.

1601:M 16 Nov 2022 16:24:46.469 # Client id=17860315 addr=198.19.224.11:44001 fd=13 name= age=1942388 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=4814 omem=268550992 events=rw cmd=replconf scheduled to be closed ASAP for overcoming of output buffer limits.

After that connection with replica forcefully disconnected and slave requests for partial sync. But due to lack of backlog partial sync was not possible was not possible.

1601:M 16 Nov 2022 16:24:48.469 * Unable to partial resync with replica 198.19.224.11:6379 for lack of backlog (Replica request was: 9795787051600).

Full sync starts and master starts to create dump for first phase but mean while again output buffer gets full and again disconnection happens and again full syncs starts. This situation goes in loop never able to finish full sync due to constant write load on master.

Same situation we had for second salve as well after few seconds.

Our configuration: repl-backlog-size 1mb client-output-buffer-limit 256mb

Now issue is almost clear to us but the question remains is what values of buffers we should choose so possibility on partial sync is high instead of full sync in buffer full situation and high load situation. From buffer full situation i estimated it was filling 64 mbps in buffer.

My understanding is in above situation if we wants to make partial sync possible after buffer full situation then we must have to set repl-backlog-size higher then client-output-buffer-limit. Is that true ? I'm not able to find this answer anywhere. Yes suppose we set higher repl-backlog-size then slave will be able to find offset in backlog, yes still there is possibility output buffer will get full due to slowness of network while replication backlog sync.

Note: As I checked simple scp from one redis host to other it's just 38 mbps which is very slow I agree. But on other hand if we monitor port of redis server via Grafana then throwing 130 mbps data load situation.

Comment From: bhaveshrenpara

Logs from from the day we had issue. redis_vfde_6381.log

Comment From: bhaveshrenpara

Did test in our load and performance environment if we set repl-backlog-size higher then client-output-buffer-limit is ignored.

Also as per our testing after client-output-buffer-limit full situation not at all possible to do partial sync. If any possibility then please suggest.

From our testing conclusion, will set values as below: repl-backlog-size 256mb #if salve disconnects temporary(output buffer gets lost in this situation) then high chances partial sync will be possible. client-output-buffer-limit 512mb # If slave is slow(network or slave machine itself is slow) temporary and network or slave machine gets restored in terms of speed before it crosses buffer limit then sync again is possible without losing data.