Redis Explicit propagation of commands

Currently server.dirty has two purposes: 1. Deciding when to save an RDB 2. Deciding if the command that was just executed should be propagated

This problem is that the two don't always come together: 1. Sometimes I want to increment server.dirty but avoid propagation: Tha happens whenever a module command changes the dataset because the module subsystem is in charge of propagation) 2. Sometimes I want to increment server.dirty but avoid RDB saving: For example, a module CONFIG SET command that the module wishes to propagate. See #11292

It is probably true that most of the propagation issues regarding server.dirty somehow have to do with the fact that modules use explicit propagation, but that's for a good reason: Modules should have the flexibility to replicate a custom command to the replica (even Redis does that some times: SPOP propagates as SREM)

I suggest dropping the implicit propagation idea and having only one way of propagating commands, which is explicit propagation because it is the most flexible. That means that server.dirty will be used only for RDB saving and that each command would have to call something like alsoPropagateVerbatim if it wants to be propagated.

Pros: 1. Same code path for module/native propagation 2. Getting rid of a lot of code: CMD_CALL_PROPAGATE_* flags, the ugly if in call to propagate the current command, rewriteClientCommandVector, replaceClientCommandVector, preventCommandPropagation, and possibly more. 3. Much simpler and easy-to-follow code.

Cons: 1. A lot of calls to alsoPropagateVerbatim (but I think we can take this con and turn it into a pro: having a keyChanged function that wraps things like signalModifiedKey and notifyKeyspaceEvent that always go together. That reduces the chances of forgetting to call any of them)

Comment From: oranagra

for the record, I agree with all the arguments.

One other important con: * it'll make Backports, cherry-picks, merges of old PRs and to other forks very risky.

@redis/core-team / others, please provide feedback before we proceed to a POC.

Comment From: itamarhaber

FWIW, this seems like a good chore to have.

Comment From: JimB123

I'd like to add that we should do some explicit naming changes.

In the old days... There was the current command, and then sometimes additional things which needed to be propagated. We would "propagate" the current command, and then, if the list of things to "also propagate" wasn't empty, we'd send those too.

In Redis 7, this got reworked a bit. Functionally, it's much nicer. But some of the legacy naming was carried over. The old "propagate" routine was renamed "propagateNow". It was made static and is only used internally. We eliminated the concept of a current command as separate from the list to "also propagate". But now, the function "alsoPropagate" has been made the API for propagation.

I'd suggest that: * "alsoPropagate" should be renamed to simply "propagate". Or maybe "addPendingPropagation" - which is more explicit? This is the API for propagation. * "propagateNow" should be renamed to something a little clearer. "propagateNow" implies (to me) functionality like is provided by "propagatePendingCommands". Maybe use a name like "propagateSinglePendingCommand" - or something which better conveys its purpose. (BTW - I love that this is static. I wish there was more widespread use of static functions in Redis.)

Comment From: madolson

I conceptually agree with this path forward. I really like having modules and redis commands be propagated through more similar code paths.

Comment From: oranagra

conceptually approved in a core-team meeting

Comment From: madolson

@guybe7 Are you planning on working on this anytime soon?

Comment From: guybe7

I'm currently working on the reply schema, which will take a while, I can take a break and work on this smaller task

@oranagra @itamarhaber @yossigo any objections?

Comment From: guybe7

Perhaps we want to wait till https://github.com/redis/redis/pull/11199 is merged?

Comment From: oranagra

i think i'd rather push the reply schema first since you're already deeply invested in it. if for some reason it'll not reach completion, it'll be a huge loss of time (unlike this task that hasn't started yet and is just a cleanup). maybe post a summary in the reply schema with status and what's left..

Comment From: madolson

I ask only since someone from AWS was interested in taking a stab at it if you weren't actively working on it.

Comment From: oranagra

sure, if you have someone else with the bandwidth to handle that go ahead.

Comment From: guybe7

@MeirShpilraien i guess the order in which we want things to happen is: 1. execution of the command 2. propagation of the command 3. KSN + propagation of stuff from KSN CB

right?

for the record, currently, the order (for the vast majority of commands, maybe all except SPOP) is: 1. execution 2. KSN 3. propagation of stuff from KSN CB + propagation of the command (i.e. wrong order - or did we fix it?)

CMIIW

Comment From: guybe7

note for implementation: we want to add dirty++ in expireIfNeeded (it makes sense that lazy-expire should affect BGASVE) now we can't do it because it'll cause the implicit propagation of the read-command that triggered lazy-expire, and there's a test that would fail.

Update: maybe we want to add dirty++ to the more general deleteKeyAndPropagateLogic so that it would affect active-expire, eviction, etc.

see https://github.com/redis/redis/pull/11572

Comment From: MeirShpilraien

@MeirShpilraien i guess the order in which we want things to happen is: execution of the command propagation of the command KSN + propagation of stuff from KSN CB right? for the record, currently, the order (for the vast majority of commands, maybe all except SPOP) is: execution KSN propagation of stuff from KSN CB + propagation of the command (i.e. wrong order - or did we fix it?) CMIIW

@guybe7 I believe we solved the propagation order here (with the post notification job api): https://github.com/redis/redis/pull/11199

Comment From: guybe7

@MeirShpilraien so from now on the official policy is not to change the dataset from within the KSN CB, but rather use post execution-unit job (let's call it PEJ) right?

so the order is: 1. logical execution of a command 2. KSN (except SPOP and maybeother, where it happens before the logical execution) 3. add the command to also_propagate 4. PEJ (which may cause more command to be added to als_propagate) 5. emitting all propagations (so eveyrthing is in the right order)

CMIIW

Comment From: MeirShpilraien

@MeirShpilraien so from now on the official policy is not to change the dataset from within the KSN CB, but rather use post execution-unit job (let's call it PEJ) right? so the order is: logical execution of a command KSN (except SPOP and maybeother, where it happens before the logical execution) add the command to also_propagate PEJ (which may cause more command to be added to als_propagate) emitting all propagations (so eveyrthing is in the right order) CMIIW

Right.

Comment From: oranagra

but what does it matter if 2 and 3 are swapped? we don't care... p.s. keep in mind that sometimes the command fires multiple KSNs (e.g. a loop of pops and eventually a deletion), so it could be more complicated than that.

Comment From: MeirShpilraien

The only case where swap 2 and 3 might be different is on lazy expiration that was triggered as a result of the notification, I am not sure though if this can be logically different somehow and if we decide to disable lazy expiration on key space notification (like we already discussed before) then it should be totally the same.

Comment From: oranagra

we did disable lazy expire in module KSN in #9406