Describe the bug note1: this is a theoretical bug which I hadn't verified yet but wanted to document ASAP. note2: This came up while handling https://github.com/redis/redis/pull/9968#discussion_r773761633
If a slave for some reason goes into a loop of requesting a full sync from the master it'll cause the master to continuously fork for RDB generation (either disk-less or or disk-based). This is normal and can happen if for instance the slave fails to load the RDB because of memory issues.
The problem is that we allow only a single fork on the master and this fork will constantly be used to generate the replication bulk RDB. All cron jobs related to persistence will be starved because the "fork lock" will be used for replication. So a faulty slave will effectively disable persistence on the master.
To reproduce
We can run redis-cli --rdb mydump.rdb in a loop and start writing data to the server configured with repl-diskless-sync-delay 0 and repl-diskless-sync yes and some save configuration. If we never get a dump file this is the bug.
Expected behavior
I'd expect the snapshot or aof rewrite mechanisms to take precedence in such a case. One way this can be done is to flag a pending snapshot or rewrite operation and once we're done forking for replication fork for them if the flag is on (before any new sync command is processed).
Comment From: yoav-steinberg
Update: I recreated this like this:
rm dump.rdb
redis-cli config set repl-diskless-sync-delay 0
redis-cli config set repl-diskless-sync yes
redis-cli config set save "5 1"
while true; do timeout 0.05 redis-cli --rdb /tmp/bla.rdb; done
then I wrote something to the database: redis-cli set x abc and verified nothing was saved.
Once I stopped the while loop above, dump.rdb was created.