Hi, I have redis, that always fails on dump with 'Error reading bulk length while SYNCing', So IDK what to do

Sometimes it fails with 'I/O Error reading RDB payload from socket'

idk what to do

Comment From: oranagra

please share a bit more info. which version of redis-cli are you using (redis-cli -v) some info from redis (redis-cli INFO) redis configuration, specifically: redis-cli config get repl-diskless*

Comment From: proggga

I've made a full recheck and found interesting and very sad case.

I've have redis installed on 1cpu host with 8 mem limit. Redis occupies 6gb. When I tried to dump with redis-cli --rdb I've got this error

Client id=107837 addr=127.0.0.1:55124 fd=17 name= age=192 idle=192 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16351 oll=8974 omem=692824400 events=rw cmd=sync scheduled to be closed ASAP for overcoming of output buffer limits.

So I've started tune client-output-buffer-limit-replica, and after small increase I've got Out of Memory at redis-server.

I thought that I've made a mistake, but after all I found that redis make wrong: first at redis-cli rdb, it calls bgsave and save all data to rdb.dump. Secondly redis push this file to socket to redis-cli. Cause I have only 1 cpu it works very slow, so buffer expanding till OOM. Idk why redis push file so fast. It's simple file transfer. I've changed configuration to 2cpu and problem was solved. We have problem here, and we should discuss how to fix it somehow.

Comment From: oranagra

@proggga thank you for the additional detail. i still don't know which version of redis and which version of redis-cli you're using.

the problem you saw in the log is a known issue or actually a design decision. redis uses fork to generate a snapshot that it sends to a replica, and while the fork child process is producing the snapshot, redis needs to buffer all write command so that it can send them to the replica when it's done reading the rdb data. in addition to that there's an accumulation of CoW during that process (when there's write traffic to the parent). in your case, redis-cli mimics a replica in order to obtain the rdb data.

however, there are two improvements you can make.

first, if you have a slow disk (slower than the network to where redis-cli is executed), you can use diskless replication, assuming you're using a fairly new redis-cli (6.0 and up), when you'll enable repl-diskless-sync config on redis, it'll cause redis to stream the rdb file content directly to redis-cli socket, without going though the disk.

secondly, if you'll use both redis, and redis-cli v6.2, then redis-cli will send a REPLCONF RDB-ONLY command to redis, which will tell redis not to keep the replica buffers (the ones that caused the disconnection and log message you mentioned). i.e. as i mentioned, redis-cli mimics a replica in order to get the rdb file, but unlike a normal replica, it's not interested in the command stream that follows, and in v6.2 it can hint redis not to keep them.

note that you'll still suffer from CoW during that process, and that can cause the kernel OOM killer to kill one of the processes (if you use the new oom-score-adj config you can make the kernel prefer to to kill the child rather than the parent).