Redis redis halts/idle more than 10 seconds after BGSAVE finishes

Discussed in https://github.com/redis/redis/discussions/9457

^{Originally posted by **nbari** September 2, 2021} I am running a cluster "sentinel" of 3 nodes (1 master, 2 replicas) Redis version `6.2.5`, OS FreeBSD 13, the dataset is approximately 70GB, the system has 320GB RAM, SSD disks, and the servers (dedicated, not VM's) are in the same network/datacenter, probably hardware is not a problem, but I start to notice that after `BGSAVE` finishes, there is a lag of approximately 10 seconds, because of this, the applications randomly get this error: `NOREPLICAS Not enough good replicas to write`, for now, I increased the value of `min-replicas-max-lag` to `20` but would like to better understand why this delay and how to prevent it? bellow some metrics when I started to notice this issue: Screenshot 2021-09-02 at 18 54 21

This is the current configuration:

appendonly no
daemonize yes
databases 8
dbfilename dump.rdb
dir /var/db/redis
min-replicas-max-lag 20
min-replicas-to-write 1
pidfile /var/run/redis/redis.pid
protected-mode no

maxmemory 255092mb

save 900 1
save 300 10
save 60 10000

io-threads 13
io-threads-do-reads yes

client-output-buffer-limit replica 16gb 16gb 60
repl-backlog-size 4gb
repl-timeout 3600

For testing, I used `redis-cli` with the option `--latency`, before `BGSAVE` finishes this is the output: min: 18, max: 45, avg: 27.01 But after `BGSAVE` finishes I get the lag: min: 18, max: 10912, avg: 27.01 The pattern that I start to notice is that this happens in all nodes(master/slave), probably indeed has nothing to do with the replication but always happens after after `BGSAVE` finishes (output of `trusss -dD -p 86400)`:

86400: 1.299656368 0.000064812 write(7,"xL\^S#\M-*'\M-D \^Q@*\M-`\^D\^Q`"...,131072) = 131072 (0x20000)
86400: 1.300105565 0.000022102 write(7,"-\M^R \M-'\^De(a,b\M-`\^B\M^D"...,37480) = 37480 (0x9268)
86400: 1.300168366 0.000017520 fsync(7)          = 0 (0x0)
86400: 1.300232401 0.000018878 close(7)          = 0 (0x0)
86400: 1.325321507 0.024984899 rename("temp-86400.rdb","dump.rdb") = 0 (0x0)
86400: 1.325500433 0.000018393 write(224,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,40) = 40 (0x28)
86400: 13.687191004 12.361617823 exit(0x0)
86400: 13.687230002 12.361656821 process exit, rval = 0

There is a delay between the write and exit of ~12 seconds:

86400: 1.325321507 0.024984899 rename("temp-86400.rdb","dump.rdb") = 0 (0x0)
86400: 1.325500433 0.000018393 write(224,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,40) = 40 (0x28)
86400: 13.687191004 12.361617823 exit(0x0)

Probably this could be related to the [latency generated by fork](https://redis.io/topics/latency): I am running the tests with `vm.pmap.pg_ps_enabled=0` in `/boot/loader.conf` [![redis halts](https://img.youtube.com/vi/rYTXaJ_Nehs/0.jpg)](https://youtu.be/rYTXaJ_Nehs?t=63) From the video, you can notice that `write` is halting the full server, any idea about how could this be improved?

Comment From: oranagra

in theory the freeze on the child process shouldn't affect the parent. can you please set latency-monitor-threshold to some 10, and then get LATENCY LATEST after the freeze?

can you please also try reproducing this with the unstable branch? specifically #9409 can help if you have a single string key that's really big (because we break writes to smaller ones, not because of sync_file_range which is only used on Linux)

Comment From: nbari

Hi @oranagra this is the output of LATENCY LATEST after the freeze:

1) 1) "fast-command"
   2) (integer) 1631702879
   3) (integer) 8231
   4) (integer) 8231
2) 1) "fork"
   2) (integer) 1631702942
   3) (integer) 2159
   4) (integer) 2159
3) 1) "command"
   2) (integer) 1631703798
   3) (integer) 20
   4) (integer) 8231

I will test now with the unstable branch redis_build_id:a5be941cf9fc2e25 - f560531, I notice that it has by default: rdb-save-incremental-fsync yes

) "rdb-save-incremental-fsync"
2) "yes"

This is the output of LATENCY LATEST after the freeze:

1) 1) "command"
   2) (integer) 1631706691
   3) (integer) 18
   4) (integer) 6192
2) 1) "fork"
   2) (integer) 1631706691
   3) (integer) 1114
   4) (integer) 1114
3) 1) "fast-command"
   2) (integer) 1631706691
   3) (integer) 15
   4) (integer) 15

I didn't see any improvement, still idle for some seconds after BGSAVE finishes:

min: 0, max: 6191, avg: 0.58 (15010 samples)