Redis [NEW] Redis module configuration

Problem Statement

At AWS we use modules internally to build out functionality outside of the main Redis engine, and we imagine that other managed services/users do this as well. One of the drawbacks of this is that we don't have a single mechanism for configuring redis engine features and features provided by the module. So we would like to propose the following improvement:

Proposal

Add 4 (bool, sds, enum, numeric) new Redis Module functions for dynamically creating standard configurations for the module. This needs to be called in the onLoad function of the Module. These functions will look like RM_RegisterConfig*.
- “moduleName:” is added to the module config’s name on creation to forcefully maintain a naming pattern. (We'll use this later as well.)
- Pointers to the variables in the module are passed as pointers to the new Module config register function to use for the standard config value address.
- Enum configs will take an array of keys and values to define them.
Dynamically grow the standard config array - During the onLoad of the module, new configurations will be added to the standard config array in config.c
On Module Unload - The standard configurations are searched for and removed by default using the module configuration naming pattern.
On Redis Server Startup - Defer the loading of both modules and module configs until all of the config file has been loaded. This is possible because module names have a defined syntax with the ":":
Module configs will also be able to register the update and is_valid callbacks, these will be optional.

Example

An example, if we have a module, we'll call it replication-manager and it wants to register a config called repl-batch-size that is a numeric type. When the module loads, it will register the config repl-batch-size using RM_RegisterConfigLongLong("repl-batch-size", flags, long long **pointer, <constraints>, callbacks);. You would be able to update this config by calling config set replication-manager:repl-batch-size 12.

Why not use argv

Using the argv only works on startup, and we want to be able to tune existing values.

Why not use a new command

That's a lot of work to have a "config" command per module. You can make the argument that you can use filtering to make it transparent to the caller, but that is a lot of work to maintain separate infrastructure in lots of different modules.

Comment From: oshadmi

Is it worth considering adding an optional token to express whether a specific config has effect during runtime? (since some configs may only take effect during module init) For example, something like config set replication-manager:on-init:repl-batch-size 12 And can also have an enhanced is_valid callback with an additional param which is the current life-cycle phase (on-init, during-runtime, etc.) to allow to fail/override depending on the actual phase. If omitted, the config is effective during all life-cycle phases.

Comment From: madolson

@oshadmi Not sure I fully understand what you mean. AFAIK there are only two modes, on-init and at-runtime, which is currently being handled by the standard configuration system used internally within Redis. Do you have suggestions for more types of configs?

Comment From: oshadmi

@madolson What I am suggesting is to be able to specify for a config also the mode in which it can be configured. If a mode is specified, then the config can be configured only in this specific mode, e.g., config set replication-manager:on-init:repl-batch-size 12, will allow configuring repl-batch-size only on init. If mode is omitted then there's no mode constraint.

Comment From: madolson

@madolson I think that can be handled by flags? Today we have IMMUTABLE and MUTABLE flags for if the config can be changed at runtime or only set at startup.

Comment From: oranagra

The suggestion sounds good to me as long as we can still fail redis at startup when a bad configuration is present. So considering we defer the validations of these to after the config parsing, but validate them at init after loading the modules and before loading persistence files or binding the socket that should be ok (p.s. we do a similar thing with Sentinel these days).

However, i don't like the idea of passing a pointer to the variable (sounds dangerous of being incorrectly used), i think just using callbacks is better (so the module is the one who writes to the variable). If we use callbacks, we need a get and set ones (the deferred data structure can be dismissed after startup), and we don't need the is_valid one (the set can just fail). this is slightly harder to use by the module, but still serves the purpose of one config mechanism that's exposed to users.

In fact, if we go this way, redis doesn't actually need to be aware of each and every config the module wants to register and its type. The module can just register a single pair of get/set callbacks, and parse / format the value on its own (possibly with some helper APIs from redis to handle memory types etc). This is maybe similar to RM_RegisterInfoFunc, we'll let the module be part of the generic config mechanism, but we won't couple it with our config table implementation.

other notes: - maybe we should re-think the separator (:) which can maybe be used as a separator between key and value in some systems. e.g. for sub-commands we use | in ACL, and also in commandstats soon. and the usual convention for module commands is a . separator.

Comment From: madolson

We originally didn't suggest the idea of completely deferring to the module (RM_RegisterConfigFunc), because you can do achieve the same result with a filter to detect your module domain and call a new config command introduced by the module. You don't get the other generic benefits of listing available configs and re-using the standard configuration infrastructure for parsing arguments. I don't think coupling the module commands to our config table is all that significant, since the Redis code will barely notice.

The usage of ":" was making me think of namespaces, but I suppose '.' achieves the same purpose if that is clearer.

Comment From: oranagra

The command filter trick is hackish (and also doesn't solve the startup time config parsing). I prefer to provide a proper mechanism for modules to integrate with this basic feature of Redis. Same thing was for INFO... There where hackish ways, and also modules can provide their own INFO commands, but other tools are using the standard ways for config and info, and we should certainly let modules be part of that.

But that doesn't mean the interface should be one way or the other.

My issue with providing pointer to a variable to write into is not because it'll complicate redis code (it won't), but because: 1. I think passing pointer to a variable is dangerous, I can see how this is incorrectly used and causes issues on certain platforms (e.g. Big-endian, or certain platforms and ABI issues) 2. I think it makes the API coupled with the current implementation in Redis, which can mean that we can't easily change the redis implementation, or the module is limited by its capabilities. Also a certain module may work differently with different versions of Redis (something that the api should avoid).

It's right that exposing the interface you described will be easier for redis, and probably easier to use by modules, but I'm not certain that right. I am certain we wanna expose an interface that makes it possible, but if the module needs to work a bit harder to use it, that may be OK.

@yossigo waiting for your input...

Comment From: madolson

I'll defer to the Redis Ltd folks here who have a lot more experience with modules. I don't fully see the ABI issues causing an issue.

One way to get around that might be to be to have the config function look like (long long *) registerNumericConfig(whatever), and the core will allocate the memory and just return a pointer for the module to reference. That would be more decoupled from the internals of Redis, but still expose the same type of interface.

Comment From: yossigo

Registering module parameters into the current config subsystem intuitively feels a bit on the over-coupled side, but trying break it down and analyze it I'm not sure it really is so:

We already have various register-something functions that hold a pointer to a module-managed structure (which is presumably a bit more potentially risky than just holding a pointer to a .text function address).
Using the existing mechanism for both core and module parameters introduces coupling at the implementation level only. If we have to, we can always refactor that in the future - I don't think the API is binding us to anything specific.

Quality wise I think we have a lot to gain by offloading as much work from the module as we can, so the trivial cases get validation, proper error messages, rewrite working as expected, etc.

BTW I think this applies even if we choose to re-use the existing mechanism but refer to a set/get callback rather than directly modifying memory values. So maybe the alternatives are not all that different - it's probably best to come up with a mock API next.

Comment From: oranagra

@yossigo where in the existing API redis keeps a pointer to module allocated variables? AFAIK the only thing we have is function pointers (and module APIs taking struct pointers as input but not keeping or writing to them). i.e. we never write to module memory, and the only ones we read from are structs that we define.

Comment From: yossigo

@oranagra you're right, I thought we hold a reference to some function structs but we don't. And definitely not writing to module memory.

I think I'm still OK with having an explicit set/get callback, but still rely on the rest of the configuration mechanism and rather than push all the responsibilities to the module.

Comment From: madolson

Ok, so based on my understanding, the proposal is then something like this?

#define MODULE_CONFIG_MUTABLE 0
#define MODULE_CONFIG_IMMUTABLE 0<<1
#define MODULE_CONIFIG_SENSITVE 1<<2
#define MODULE_CONIFIG_DEBUG 1<<3

#define MODULE_CONIFIG_NUMERIC_MEMORY 1<<16

/* Called when the value needs to be fetched from the module. *privdata is the way the module
 * knows where to find the value, . */
typedef (RedisModuleString *) (*RedisModuleConfigGetStringFunc)(const char *name, void *privdata);
typedef long long (*RedisModuleConfigGetNumericFunc)(const char *name, void *privdata);
typedef int (*RedisModuleConfigGetBoolFunc)(const char *name, void *privdata);
typedef int (*RedisModuleConfigGetEnumFunc)(const char *name, void *privdata);

/* Called when the value needs to be stored into the module. 
 * Context will either be REDIS_MODULE_CONFIG_ON_STARTUP or REDIS_MODULE_CONFIG_SET.
 * on REDISMODULE_OK, the value is accepted, on REDISMODULE_ERR the value is rejected.
 * Perhaps we should be able to set and error code as well? */
typedef int (*RedisModuleConfigSetStringFunc)(RedisModuleString *new, int context, const char *name, void *privdata);
typedef int (*RedisModuleConfigSetNumericFunc)(long long new, int context, const char *name, void *privdata);
typedef int (*RedisModuleConfigSetBoolFunc)(int new, int context, const char *name, void *privdata);
typedef int (*RedisModuleConfigSetEnumFunc)(int new, int context, const char *name, void *privdata);

/* Register the configs */
RM_RegisterStringConfig(const char *name, int flags, RedisModuleConfigGetStringFunc *getfn, RedisModuleConfigSetStringFunc *setfn, *privdata);
RM_RegisterNumericConfig(const char *name, int flags, long long min, long long max, RedisModuleConfigGetNumericFunc *getfn, RedisModuleConfigSetNumericFunc *setfn, void *privdata);
RM_RegisterBoolConfig(const char *name, int flags, RedisModuleConfigGetBoolFunc *getfn, RedisModuleConfigSetBoolFunc *setfn, void *privdata);
RM_RegisterEnumConfig(const char *name, int flags, const char **enum_values, RedisModuleConfigGetEnumFunc *getfn, RedisModuleConfigSetEnumFunc *setfn, void *privdata);
*/

It's a little bit verbose, but allows us to support the trivial cases we identified in Redis with the 4 types (strings, bools, numeric long longs and enums. If someone wants to do something fancier, they can always just register a string config and then do all of their own custom parsing.

Comment From: oranagra

so the set callback can fail, so that comes instead of the is_valid callback. And I suppose in flags we expose all the flags we currently have in config.c?

But note that we have two sets of flags in config.c, one global (mutable, etc), and one for numerics (memory, and soon a few others). For numeric, we may also need a sign flag to be able to handle 64bit signed and unsigned.

I think the set one may nee to get a hint to distinguish between startup and CONFIG SET. And I think it may be useful to pass the config names to the callbacks (in case the module wants to register the same callback and privdata (can be more convenient in some cases).

P. S. Another interesting case is a module that's wrapping some library that has a config interface and the module isn't aware of all, or doesn't care to register them one by one. In that case if we allow some prefix to be registered and then the callback can just trim it and forward the call.

Comment From: madolson

We can re-use the flags for both generic configs and type flags. As mentioned before I don't think we want to strictly couple the implementation detail.

Having a bit for signed/unsigned makes me feel a bit uncomfortable, since the module side has to be careful about making sure they are doing the casting right. I would almost prefer only support long longs, and then push modules to handle the unsigned bit if they really believe they need it.

No strong preference about also passing int the name, so let's do that as well.

Ack about the startup/steady state set, I forgot about that.

Comment From: oranagra

since we pass min and max values as input, a module that wants to handle unsigned long longs is gonna be forced to use strings, i don't like that.. (sounds like Java, not C). p.s. we could also consider a double config (maybe use this opportunity to introduce on in redis infra).

anyway, i guess we agree on many of the details, so next step would be to try to code it and see if it makes sense.

Comment From: yossigo

@madolson The API makes sense to me. I think CONFIG REWRITE support will require some additional facilities, like determine default values or something even more explicit.

Comment From: JimB123

I'm not sure the "get" callbacks are needed. Redis will need to maintain all of the values in string form - because if the module isn't loaded yet, we don't know the type. So, basically, if we see a config value "foo.param1 = 64" we would need to store that somewhere until (and if) the "foo" module gets loaded. Furthermore, if the module gets unloaded, we still need to maintain that configuration in Redis, in case the module is loaded again.

So we'd need to do something like: 1. If the config name does NOT contain a ., handle it as we do today. 2. If the config name contains a ., add the key/value into a hash table for modules.

When a module is loaded and registers for its config, NOW we know the type of the data. We can parse the value, given the type, and then use the appropriate "set" callback to provide the value to the module. Validation is performed by the module, at this point.

If we ever need to rewrite the config, no interaction with the module is necessary - we have all of the string values already in the dict. We just iterate over the dict and spill the items. No type conversion needed.

When processing a CONFIG SET command, if the module is active, we could call the module's "set" function for validation before updating our dict of string values. This would allow us to reject an improper CONFIG SET. Of course, if the module is not loaded, the CONFIG SET would just be accepted and the key/value would be stored in the dict - any validation error would be detected when the module is loaded.

When processing a CONFIG GET, the string value can be returned from the dict. There's no reason to interact with the module.

Comment From: madolson

@JimB123 Your proposal seems to artificially introduce constraints into the system. Today, both GET/REWRITE can return a value different than what was "set", such as the aliasing support (slave -> replica for example) or human readable values -> raw byte values. It seems more logical to delegate all this work to the module and keep the get callback.

Secondly, I don't see why we need to "keep track" of the configuration value if the module wasn't loaded.

Comment From: oranagra

i don't think we need to maintain the values after startup. the only reason we need to keep the values is because we first parse the config file and command line args and then load modules, so so from the user's perspective, the same config file defines modules and their configs and redis needs to handle that config file in the right order (same thing we do for sentinel today).

But i don't think these values from the config file should be also kept in case the module is later loaded with the MODULE command, or is unloaded and re-loaded. Furthermore, I think that if at startup we found that one of the configs we parsed from the config file isn't matching any loaded module, or was rejected by one, we should fail the startup (call exit(1))

I also don't think that caching the value and using it instead of a get callback is wise. maybe in some cases the module also has some other command that can override the config or it did some post config clamping.

bottom line, i think we do need the get callback, and i think we only hold the config values in redis for a short period during startup, nothing more.

Comment From: JimB123

@madolson @oranagra I understand your viewpoint. My thought was that this introduces somewhat weird/unexpected behavior.

Let's say I'm experimenting with a module. I create a config file which includes configuration for the module.

Things that seem weird: * If I don't load the module at startup, the system should crash? * If I load the module at startup, and then unload/reload the module (possibly changing code) - then the configuration is missing? * If I unload the module and then write out my config data, the module configuration which was originally in the config file is removed/lost?

maybe in some cases the module also has some other command that can override the config or it did some post config clamping.

Do we have any existing cases in Redis where if I load a config file, and then do a rewrite, that the output file is different? (Other than changes caused by CONFIG SET?) Or would this feature be adding a new, and possibly unexpected, behavior?

Comment From: oranagra

If I don't load the module at startup, the system should crash?

it won't crash, it'll refuse to start, just like any other config file syntax error or conflicting configs (e.g. activedefrag).

If I load the module at startup, and then unload/reload the module (possibly changing code) - then the configuration is missing?

that's a very good argument, but currently for many modules (the ones which register data type) the only unload / upgrade path is to restart redis. for other modules when you unload them, they leave no trace in the system (other than keys they changed), i don't think redis should be their configuration store in case they'll get reloaded.

maybe we do need to add some specific mechanism for in-place module upgrades, in which we'll add some mechanism for hand-off of data between the old code and the new code. i.e. something like serializing some globals into a string and then parsing it in the new instance, or passing a pointer to some globally allocated memory left behind from the module being upgraded. this will solve both the configuration problem as well as other globals and state that needs to be preserved during upgrade, and the module will be more aware of the fact it's loading an upgrade state (in which some variables may need adjustment for a new instance) rather than just being given some config (which may not be valid for the new version).

If I unload the module and then write out my config data, the module configuration which was originally in the config file is removed/lost?

yes, that seems logical, note that after the change in #4848 the module itself will no longer be in the config file, or otherwise, new modules that weren't there will appear.

maybe in some cases the module also has some other command that can override the config or it did some post config clamping.

Do we have any existing cases in Redis where if I load a config file, and then do a rewrite, that the output file is different? (Other than changes caused by CONFIG SET?) Or would this feature be adding a new, and possibly unexpected, behavior?

i'm quite sure there are quite a few cases of clamping, one that comes to mind is maxclients, but i'm sure there are others. in any case, i think it'll be a bad idea that both redis and the module will hold (different copies) of the config, i.e. the module will have some integer in a global variable, and redis will have a string, one will be used for actual work and the other just for GET and REWRITE.. sounds to me like a bad idea.

bottom line, i think i wanna stick to the plan of keeping the configs in redis just for a deferring startup, and maybe some day pick up the upgrade issue and handle it properly.

Comment From: madolson

I agree with everything that Oran said.

To touch on one of Jim's points, module reload is not in a great state today, since there are conditions that prevent the unload (adding a type) and there is no way to preserve the reload state easily. We should probably tackle that as a single issue in the future.

Comment From: madolson

Just to post a quick update here, we found some weird behavior around rewrite and how you load a module after startup. Someone from AWS is working on this, and should be able to post a PR in time for RC1.