Redis [NEW] Extend keyspace notification API to include meta data of what portions of the key have changed

The problem/use-case that the feature addresses

Current keyspace notification callback only tells what key has changed, it does not carry information about what portions of the key have changed. For example, "hset" event doesn't tell which field or fields have been modified. I have two use cases in my module under development:

When a hash key is mutated, I'd like to know precisely which fields are mutated so that I can efficiently perform some action on the modified fields. This applies to hash command "hset", "hdel", "hincrby", "hincrbyfloat" and "hsetnx".
My module also creates a new data type that is a composite data structure containing multiple fields. Each write command of this data type fires a REDISMODULE_NOTIFY_GENERIC keyspace event. I'd like the event to carry information of what fields are mutated, so that the subscriber perform some action on the mutated fields.

Note that use case #1 is for hooking into redis data types and actions (not 3rd party modules). Use case #2 is for supporting 3rd party modules for the same feature. In other words, if a 3rd party module creates data type X, the proposed mechanism can also apply to type X.

Description of the feature

I'm proposing to extend the keyspace notification API to include meta data of what portions of the key have changed. To maintain backward compatibility, we can add new APIs like the following:

int RM_SubscribeToKeyspaceEvents_V2(RedisModuleCtx ctx, int types, RedisModuleNotificationFuncV2 callback, void privdata)

typedef int (RedisModuleNotificationFuncV2) (RedisModuleCtx ctx, int type, const char event, RedisModuleString key, void *privdata);

Note that "privdata" is just a rough idea of how to make the new KSN interface carry meta data of the changes. For example, this privdata could be just a list of strings, with each element as field name that has changed. Of course, we need to design the privdata struct in a generic way that it not only applies to hash, but also applies to other data types such as stream, sorted set, etc. I'll leave that details to the design, if this proposal is accepted.

If privdata is NULL, it is essentially the same as the current API.

Note that I'm not proposing the event notification to carry "changed data", which could be large. Rather, I'm proposing to carry "meta data of the change", e.g., names of hash fields being modified, which should be small.

Alternatives you've considered

An alternative approach, a much simpler one, is to fire two events per mutation, one before the mutation and one afterwards. Current keyspace events are all fired after the mutation. So, we can simply add another notification BEFORE the data is mutated. Then, it's the subscribers' responsibility to fetch data before and after the mutation and compute the diff. However, the main drawback is that subscriber needs to fetch the value of the key twice and compute the diff.

This alternative approach is inferior because it engenders severe performance penalty. Note that this is a per key event notification. Fetching entire key value twice and computing the DIFF is a very expensive operation. Imaging a hash key has 10k fields and and the size is 500MB. The alternative approach is just too expensive to be usable.

Comment From: oranagra

a few things i don't understand. 1. is the request also about clients getting that info (via SUBSCIBE)? or just modules getting that info when core data types are changed. 2. i don't think i understand how your proposal completely, even assuming the only "subscriber" is a module, the interface should be generic enough so that many types of operations on very different data types can all work with it, and i don't even understand how a generic listener can be prepared in advanced (static interface) for various data types that will come in the future.

@itamarhaber i'm sure there were requests for such a feature for the benefit of clients in the past, maybe you can comment on that. @MeirShpilraien maybe you can comment on the module API part.

Comment From: itamarhaber

This is indeed a recurring theme of requests.

The initial ones started after the introduction of keyspace notifications, of course, and included AFAICR: * The value on expiry * Differentiation between create and update * Old and new values for write operations

The main argument against these, besides complexity vs. benefit analysis, was the potential size of the messages as they include data of arbitrary size (e.g., expiring a hash with 11K fields).

To expand and contextualize @oranagra's no. 2, we have a similar challenge with Redis server-assisted client-side caching (CSC). It is one thing to notify about a change to a key. It is another, much harder, thing to describe what happened to the data (structure). I don't believe there's a generic way to do it other than the existing Redis DSL, which means hooking into the commands' ingress (hooks, which IIRC we have for modules) or the replication egress.

Comment From: joehu21

In my use case, the module (I'm developing) is the subscriber to keyspace events. Upon initialization, the module subscribes to keyspace notification. It is interested in the changes to hash data, the delta of before and after the hash key is mutated. The sole purpose of the proposal is to enable subscribers to know the diff between "before the data is mutated" and "after the data is mutated". However, I'm not proposing the event notification to carry "changed data", which could be large. Rather, I'm proposing to carry "meta data of the change", e.g., names of hash fields being modified, which should be small.

Note that the proposed feature should not only be limited to hash data. It should be a generic interface that applies to all other data types such as set, list, sorted set, etc.

I acknowledge the challenge of providing such a generic interface as well as maintaining backward compatibility at the same time. Hence, I'm thinking of an alternative:

Comment From: oranagra

ok, so: 1. now i understand that this issue is specifically discussing modules (not generic KSN for clients, this wasn't clear) 2. i still don't understand the interface you suggested at the top (with privdata), but since you now seem to have proposed another interface, i'll drop it. 3. i (think) i understand that it mainly means to hook into redis data types and actions (not 3rd party modules)

looking at your last proposal, i can argue that it already exists in some (uncomfortable) way, the command filter can be a pre-modification hook, and the KSN the post one. of course you'll have to know all the commands and data types, but i don't think that's any different in your recent proposal. the only real disadvantage is that it only works on commands (i.e. it won't work on RM_HashSet)

Comment From: joehu21

i still don't understand the interface you suggested at the top (with privdata), but since you now seem to have proposed another interface, i'll drop it.

I brought up the "alternative approach" only for brainstorming and comparing to the original proposal, not as a replacement proposal. This alternative approach is inferior because it engenders severe performance penalty. Note that this is a per key event notification. Fetching entire key value twice and computing the DIFF is a very expensive operation. Imaging a hash key has 10k fields and and the size is 500MB. The alternative approach is just too expensive to be usable.

That being said, I still favor the original proposal - making the event notification carry meta data of the changes.

The "privdata" is just a rough idea of how to make the new KSN interface carry meta data of the changes. For example, this privdata could be just a list of strings, with each element as field name that has changed. Of course, we need to design the privdata struct in a generic way that it not only applies to hash, but also applies to other data types such as stream, sorted set, etc. I'll leave that details to the design, if this proposal is accepted.

i (think) i understand that it mainly means to hook into redis data types and actions (not 3rd party modules)

My use case #1 is for hooking into redis data types and actions (not 3rd party modules). However, use case #2 is for supporting 3rd party modules. In other words, if a 3rd party module creates data type X, the proposed mechanism can also apply to type X.

Comment From: oranagra

ok, so we'll drop the alternative idea, but the original one is an incomplete design, lacking a solution to the main complication. and i don't yet see how it can be designed to be generic enough, so until someone comes up with a proposal, i don't think we can proceed.

Comment From: joehu21

I don't believe there's a generic way to do it other than the existing Redis DSL, which means hooking into the commands' ingress (hooks, which IIRC we have for modules) or the replication egress.

the command filter can be a pre-modification hook, and the KSN the post one.

I think the above is the best outcome of this issue. My gut feeling is that command filter should satisfy my use case. By hooking into command ingress, we can inspect its argv, which will tell us what will happen to the structure of the data. We can save this information in a thread local object (thread local, because all callbacks are executed on the mainthread). Then, at the post-modification KSN, we know what portions of the key have changed.

Comment From: oranagra

FYI, this one just came in: https://github.com/redis/redis/issues/12073

Comment From: itamarhaber

And these before it: #1697, #2057, #3186, #6973 (wow, almost our port) :)