The problem

Redis considers its dataset as a set of uniquely identified keys of different types. But for users, the dataset really has a more semantic value, often a set of entities of different types and their relations.

Different applications that directly access the dataset need to be aware of this and avoid messing up data. Redis does not play a role in enforcing integrity.

One workaround is to create Lua scripts that operate on the data and provide a more encapsulated interface, but it is not very efficient to handle everything through Lua. Also, Lua scripts cannot do everything.

In the RDBMS world this is handled by schemas, views, triggers and many well recognized constructs. All of this comes with the price of performance and less flexibility, which are also well recognized tradeoffs.

Redis evolved from something different, and trying to make it look like a relational database is NOT the goal here. However, Key Annotations (KA) is an attempt to propose a different solution to some of these problems, in a way that is more compatible with the Redis view.

Key Annotations

Key annotations make it possible to programmatically encode more information about the dataset structure into Redis.

Key annotations are not part of the dataset and are stored as separate configuration, much like ACL. How to handle this configuration is still an open issue, see Open Issues|Metadata below for more discussion

Key annotations apply to keys, specified by a pattern, and define actions that Redis needs to automatically perform depending on state changes of the key.

We can think of key annotations as an extension to keyspace notifications, where the notification triggers an immediate action. All actions performed by key annotations are propagated (to replicas and AOF) as user commands. For this reason,key annotations only apply to commands received directly from users and DO NOT APPLY to replication streams or AOF.

There are many nuances around the integration of KA with existing flows. See Open Issues|KA Integration for more discussion.

KA Configuration

The KA configuration is manipulated using explicit KADD and KDEL commands to add and remove annotations, and KLIST to list configured key annotations.

The details of this mechanism should still be worked out, and may also depend on how the metadata discussion Open Issues|Metadata.

Examples

Let’s assume we have a user:<id> hash and user:<id>:friends set that points to user keys, and we want to make sure they are always removed in tandem:

KADD user:% ON DEL DO DEL user:%:friends

We maintain user:<id> hash and many user:<id>:something:<some-id> volatile keys, and we want to easily track them without using KEYS.

KADD user:%:something:* ON CREATE DO SADD $key user:%:somethings
KADD user:%:something:* ON DEL DO SREM user:%:somethings $key

For more complex logic, we can also invoke Lua scripts.

KADD user:%:something:* ON UPDATE DO EVAL "some-lua-script" 1 $key

Definition

<annotation> ::= <keys-spec> "ON" <key-condition> "DO" <action>

<keys-spec> ::= <key-spec> <keys-spec> | <key-spec>

<key-spec> ::= name 
    | name-prefix "%"
    | name-prefix "%" name-suffix
    | <key-spec> "*"

<key-condition> ::= "CREATE" | "UPDATE" | "DEL"

<action> := redis-command

Actions

When possible, it is preferable to directly invoke a Redis built in command as an action rather than Lua scripts. The commands invoked should be white listed based on special command flags.

Basic substitution rules will apply to executed commands:

  • The % character is substituted for the matched pattern. To use % in the action, quote it as %%.
  • The $ character is substituted for a KA variable. To use $ in the action, quote it as $$. Initially we support a single variable - $key, matching the name of the key triggering the annotation.

Atomicity

Redis should guarantee that KA is executed atomically, as if the user explicitly used MULTI around the original command and the resulting KA actions.

Open Issues

Metadata

Key annotations configuration is a new type of metadata, which does not map directly to existing Redis concepts. We several choices:

Annotations as configuration

One way to see it as plain configuration, such as ACLs, keyspace notifications settings, etc. This kind of configuration already exists, and it is the user’s responsibility to replicate it to replicas, cluster nodes, etc.

The fact that key annotation applies at the source promotes the idea that it is pure configuration that needs to be manually propagated, as replicas don't have use for this configuration (as long as they're not promoted).

An argument against this approach is that the impact of key annotations on data is more significant than ACLs or other redis.conf directives. We can compare them to scripts, but the "contract" around scripts is more loose:

  1. Clients executing an EVALSHA for a missing script get a feedback, unlike a missing KA.
  2. As part of the contract, clients should be able to fall back to EVAL if the script is missing.

Annotations as first-class metadata

An alternative is to introduce the concept of first-class metadata which Redis can store along with the keyspace.

Users still have to manually propagate KA configuration across cluster nodes, but once loaded they're persisted and become part of the dataset.

This will require some additional changes: 1. Differentiate between dataset FLUSHDB/FLUSHALL and complete FLUSHDB/FLUSHALL. 2. Extension to the RDB format to support metadata. 3. Abandon support for AOF files without the RDB preamble, or come up with a generic mechanism to serialize metadata to AOF files.

KA Integration

The integration of KA into all existing data mutation flows in Redis involves dealing with many complex special cases and nuances. We'll try to discuss all of them here, but this is probably an incomplete list.

MULTI/EXEC

KA actions that trigger inside MULTI/EXEC should be done as part of the MULTI/EXEC. They should be transparent from the user’s point of view, as they don’t expect any additional replies. There are some issues that need to be explored further, like:

  • What happens when a KA action fails with an error?
  • How do KA actions interact with watched keys?

Nested Lua

The KA specifications allow KA actions to execute Lua scripts. A KA trigger can itself fire inside a Lua script (e.g. Lua calls redis.call("del", "user:1234") with a KA on user:%), which will result with nested Lua execution.

This is currently not supported and will require some scripting refactoring to make it possible, but it may have a positive overall side effect.

Nested KAs

A KA that triggers another KA as part of its action may result with infinite recursion. We propose that by definition KAs are never nested, but we need to understand how this impacts common use cases.

Modules

Modules and KAs may interact in different ways and we need to evaluate all possible flows and define the expected behavior.

As a high level guideline, we propose that modules do not have KA actions automatically triggered for operations they perform. This pattern was already adopted in other cases, like replication.

Simple modules that mostly perform RM_Call() operations may require a context flag to enable KAs, so they perform as any other client.

KA Conditions

We currently propose three basic ON conditions - CREATE, UPDATE and DEL. We need to validate this proposal against the different key mutation scenarios handled by different commands and data types to confirm that it is possible to reasonably map them. Even for trivial commands the mapping is not necessarily obvious: does the SET imply CREATE or UPDATE?

In order to avoid extra complexity, we may need to specifically exclude special conditions. For example: * Should RENAME trigger a DEL followed by CREATE? * Should we instead create an ON RENAME condition? * If we do create an ON RENAME condition, does that require all KAs to consider that an option and explicitly specify it?

Comment From: gavrie

This looks very interesting. A few comments: - The distinction between CREATE and UPDATE does not seem very Redis-like, and neither do the terms themselves. Commands such as SET make no distinction after all. - Maybe we should define categories of commands to trigger upon? e.g. define a list of commands, give it a name, and define a KA on such a named list. Predefined lists could be added -- e.g. all commands that cause a key to be modified, or deleted. - Since each action is considered part of a MULTI/EXEC transaction anyway, it would be useful to allow specifying multiple commands to be executed in a row instead of just one (without requiring a Lua script for that). - I'm definitely in favor of "first-class metadata" that is stored in the keyspace, since the KAs are intimately related to the data. - Since the KA concept is comparable to RDBMS triggers, it seems like we're missing something much more basic which is an analog to referential integrity. How about allowing to define such integrity rules, that will prevent execution of a command if specific criteria are not met? For example, if a specific field is added to a hash, a matching field should exist (or be added at the same time) to a specific set.

Comment From: bpo

Is enforcing the semantic integrity of data a goal for Redis, broadly, long-term? The proposal looks at a constrained part of that larger problem, but I think it would be useful to hear more about how much of the larger problem will be "in scope".

Comment From: madolson

I think bpo called out what my main concern is, which is that it'll be a hard problem in Redis to guarantee integrity of the data. I also doesn't really think it fits in with the redis model that well, having "schemas" fits much better into the relational data model.

A different thought that I've had for awhile is being able to better interoperate between keys/collection structures. One of the problems that you called out is that a common use case is to have a keyspace of individual keys that contain the full data and store a reference in a collection. Maybe we should make that type of reference a first class citizen?

SET data::key1 foo -> ok
ZADDREF leaderboard 100 data::key1 ->ok
ZRANGE whatever [WITHDEREF] -> ['data::key1', 'foo']

Instead of saying there is a "schema" that Redis takes care of magically, we make it user defined.

Comment From: borg286

From an operational point of view I favor limiting recursion. One proposal is to allow a KA to be applied within the context of a user command at most once. Recursive algorithms should be applied in lua or in a client.

Comment From: yossigo

@bpo @madolson These are good questions, maybe the scope of this idea should be better defined so it'll be more clear.

I think it's clear that Redis will never get anywhere near a relational DB, so schemas, full semantic/referential integrity, normalization, etc. are not the goal.

But users use Redis for more purposes and in more ways, and at least from what I see they end up storing and managing more complex structures.

This complexity is generally handled by the client app, which is not always the best for many reasons - duplication of code across platforms, inefficient implementation, etc. Lua is another vector but it has its own set of limitations.

The idea behind key annotations is to provide a way to offload some of this to Redis - but not aim to hand over the problem entirely.

This comes up often. For example, users keep asking for hash field expiration and I keep asking myself why not keep those fields as separate keys in the first place? In many cases it's just because a hash key represents an application object, because there are too many issues around mapping it to multiple keys.

@gavrie I think your example of enforcing referential integrity is perhaps where we should draw the line and define what's not in scope.

Key annotations as they are defined now augment user commands with pre-defined automatic commands, but don't otherwise modify the behavior of existing Redis commands. For example, I don't see how we can deal with an HSET failing due to integrity rules. Do we suddenly have a whole new category of errors to handle at the client level? How does this affect MULTI? Lua? Modules? I could imagine something more similar to a WATCH/MULTI/EXEC for this purpose, but it would still be a real stretch.

@madolson As for first class references, I did think about that as well but I think it's potentially much more complex because it involves changing all the existing Redis constructs instead of creating a new construct on top. Basically every hash field/set/sorted set/list element could be a reference, and we'd need to consider how to deal with it in all cases.

@borg286 I agree that recursion could be a hard operational issue (and potentially also for developers to keep track of things), this is already mentioned in the "Nested KAs" section.

Comment From: bpo

Thanks for the additional context, @yossigo. I think it's a real issue that comes up often, as you say, and deserves investigation.

Here are a few concerns with the proposed solution:

  • Generic metadata parameters have a lot of potential but will make Redis harder to reason about. e.g. #8384 could be handled with an annotation specifying the capacity of a list. Could I annotate (max) ziplist entries per hash/zset key? etc. If the goal is just triggers, why not call it "triggers"?

  • This change has the potential to turn SET FOO BAR from ~O(1) to O(???) which should give pause. One of the strengths of Redis is to make complexity obvious by providing fundamental operations to the user rather than wrapper APIs that do many things. Triggers hide complexity.

  • LATENCY and COMMANDSTATS will probably need reworking (i.e. how/where does the cost accounting for the new operations work)

Comment From: yossigo

@bpo Forgot to mention, I'm not entirely happy with "key annotations". Technically "triggers" might be more accurate but it also comes with poor connotations to some of us... :) But definitely should consider a better name.

Regarding the added complexity, you're right of course but I think Redis has already crossed this line with some other features, client side caching as an example.

I've started to think and mention some of the aspects of integrating this with the rest of Redis, LATENCY and COMMANDSTATS are good additions to that.

Comment From: abrookins

@yossigo This seems like a potential way to add data constraints in the form of ad-hoc validation logic within Lua scripts. Or does that, like referential integrity, imply too much additional complexity around client error handling?

I didn't quite understand from the current discussion what happens when a KA fails (e.g., one that calls a Lua script) -- does the command then fail too? That would open up a lot of use cases...

Comment From: yossigo

@abrookins I see two main issues with this type of data constraints:

  1. On the client side, it would be too much of a breaking change unless we wrap it up in a new construct (e.g. a new kind of MULTI/EXEC where those constraints apply).
  2. This will require some kind of rollback capabilities on the server side.

Comment From: abrookins

Ah, right -- now I see the points above on how this would fit into the existing MULTI/EXEC atomicity mechanism. Thanks!