I think we should rewrite the blocking command infrastructure. Today it works by marking the client as blocked on a key with a bunch of special variables, when that key is written to we used those stored special variables to complete the request. This has several issues:
We maintain a lot of duplicate code to process the requests. The code as written is sort of spaghetti code. We may still have to re-block after the command is executed, which requires special handling in that case which is unnecessary. Explicit rewriting for replication. Instead, I think we should merge the blocking framework with the mechanism that was outlined for the background threads. If a client is block from a non-existent key, it should just remove itself from the event handler and register that it is blocked on a key. When that key is touched, the blocked client will be unblocked and will attempt to re-execute the command that blocked it in the first place. This shouldn't introduce any major compatibility issues and should throw the same exceptions we do today. It can then inline the command it has within the replication system. It should hopefully remove a lot of the weird checks that we have to do that are included in the CR.
I think this refactor is worth it either for this change or so that it is easier to make these sort of changes in the future. (If we agree it's worth it for this change, I'm okay with this specific PR getting merged and a separate one for the refactor)
Followup from: https://github.com/redis/redis/pull/6929