Redis Module Configuration API forces modules to have string configurations as RedisModuleString

https://github.com/redis/redis/blob/60250f50c2f308d301fb0bfcb6d07faa2560ce60/src/module.c#L11593

https://github.com/redis/redis/blob/60250f50c2f308d301fb0bfcb6d07faa2560ce60/src/module.c#L11411

getModuleStringConfig() calls the get_string() callback and gets a RedisModuleString. This function will not decrement refcount of the object. So, the module is supposed to keep configurations as RedisModuleString and just pass the pointer in this callback.

If the module has a regular C string for the config, it's impossible to use this API. On each get_string() callback, we can create a RedisModuleString from C string and return it here but it will leak as refcount is not decremented.

It'd be nice to change this API to decrement refcount of the string. That way module can hold configurations as C string or RedisModuleString. It is a breaking change though.

Comment From: madolson

My expectation for strings was that that module developers would just always keep the object as a RedisModuleString. They have the benefit of being binary safe, and implicitly contain a null terminator so that they can be used as a C string. I presume the real problem here is that it's not trivial to update your config system to use a RedisModuleString instead of the C pointer?

When I was thinking about this, my expectation is that for a type where it doesn't directly map to a Redis type, you would maintain two copies. The "encoded" copy, a C string in your case, and the raw copy, the RedisStringObject version. Internally you use the encoded copy, but when interacting with Redis you use the raw version. Taking a completely different example, a double which has no native type, can be implemented by doing a stroll on the raw version and storing it some place. Since you don't really want to be lossy on converting back, when doing get you just return the raw RedisStringObject copy. The intuition here is that there are very few configs, so keeping a second "copy" is generally free and you only need a way to encode your object from the raw RedisString. The downside is that there are two copies, which may be problematic for some more esoteric use cases.

I think there is some disagreement amongst the core folks about this though. Redis has some configs which could do this but don't, the latency metric "latency-tracking-info-percentiles" comes readily to mind. There are some that do follow this paradigm, "requirepass" for example keeps the hashed string copy as well as a plaintext copy.

Just to address your suggestion, it makes a lot of sense in the "latency-tracking-info-percentiles" approach, where you need to generate a response each time that will be consumed since we don't keep a string copy anywhere. If my proposed suggestion works well enough for you, let's stick with it, if not I don't have a strong objection to your proposal since it's just a minor change to the "normal" case. (Calling RedisModule_RetainString before returning). FWIW we were trying to simplify the normal cases as much as possible.

Comment From: oranagra

IIRC the API was designed this way for several reasons: 1. RedisModuleString is the native way to store dynamic strings between redis and the module (we usually use C strings for constant ones) 2. using C strings would mean we need to either carry an additional size_t to keep them binary safe, or add another config type (one for C strings and one for buffers). 3. the ownership model is designed so that by default, for simple modules, it'll be easy to use without risk for leaks.

Comment From: tezc

@oranagra If we make this change, modules can decide whether they want to keep their strings as C string or not. The suggested way can still be RedisModuleString. In my case, I don't have any reason to have RedisModuleString configs inside the module. So, currently, each time I access a config, I need to make another function call to get C string pointer. It gets ugly quickly.

@madolson Keeping two copies of the configuration feels like a workaround (might be a good one if we decide not to make this change) but I just feel like we should avoid that one as well. Examples you mentioned (configs that need to be generated on the fly) are probably another reason to make this change. So, modules can avoid holding a reference to RedisModuleString object that they pass in the callback (if preferred).

Just an example, I have a filename config. I don't have any reason to keep this as RedisModuleString inside the module because I'm going to use it with C library functions only. So, each access to that config will require another function call to get C string pointer. Sure we can use it that way but we can easily avoid this issue by making this change.

In summary, this change will decouple API and impl. The suggested way to store the config might still be RedisModuleString. In my case, it bothers me because it requires boilerplate code (I don't have a single config that I want to store as RedisModuleString). I guess @madolson 's examples are another reason to make this change. So, if you don't see this one as a breaking change, it'd be nice to have this option.

Comment From: oranagra

@tezc I'm not certain how clean it'll look for either plain strings or buffers. And how easy it'll be to work with without risks for leaks. I invite you to make a POC branch to look into that, maybe by looking at the code chances and example modules we'll have a better understanding. Maybe extend the examples with other use cases and we'll be able to see how these look based on the current API vs the proposed one..

I also not sure we can afford to make this change now, we might just be forced to support both types side by side.

Comment From: tezc

@oranagra Not sure if we are on the same page. Perhaps, I wasn't clear. I just suggest calling decRefCount() after get_string() callback. Please take a look at this commit, also I added an example:

https://github.com/tezc/redis/commit/88731352993d4961a1e1893c3ca1e0c100dede90

API stays same. This change is just about ownership of the string object that we pass in that callback. If module keeps configs as RedisModuleString, it will call RM_RetainString() : https://github.com/tezc/redis/commit/88731352993d4961a1e1893c3ca1e0c100dede90#diff-e5587acda3c5525531f28a5955185ca88602d205342029ae71748acd172d3716R43

Otherwise, module generates RedisModuleString on the fly: https://github.com/tezc/redis/commit/88731352993d4961a1e1893c3ca1e0c100dede90#diff-e5587acda3c5525531f28a5955185ca88602d205342029ae71748acd172d3716R46

Comment From: oranagra

ok, now i think i realize what you're after (i thought the scope is bigger). but i must say it looks a bit odd that you "retain" a string for the purpose of returning it, just because you rely on the caller to free it later.

usually, it works the other way around, i.e. the caller calls a method with a string, and the callee decides if it wants to just look at it, or retain it for later, and then when the call returns the caller can free the string if it was a temp one.. here since it's a callback it works differently and the module doesn't get a chance to run code after redis is done with that pointer.

i suppose it's no different than the RedisModuleString **err argument, where the module creates that string knowing that redis is gonna release it. maybe had we used RM_HoldString or RM_CreateStringFromString it would have looked a bit better.

I see 3 bad options to proceed: 1. do the change you suggest and document it as an ABI breaking change in 7.0.1 (basically means that modules can't use this feature of 7.0.0) 2. add some flag on config registration that let's the module choose between one way and the other (if we go this way, maybe it'll just be better to add another type and callback function that's specifically designed for C strings?). 3. keep the current code, and the module can use the strtok ugly trick of storing it in a global and recycling it later time.

@yossigo @madolson i'd like you hear other suggestions / opinions.

Comment From: yossigo

@oranagra I lean towards [2] (flag / different type). I don't think we should break the ABI, and I'm not very happy with leaving this unsolved.

Comment From: tezc

@yossigo @oranagra If we are not making this breaking change, probably we shouldn't try to solve this. Seems like option-2 will make API confusing a bit. I think I rather apply a workaround in the module itself at this point. Sorry, I have mixed feelings :) I feel like solving the issue but no need to make API more complex. Only good solution is the breaking change, so, I guess I give up.

Comment From: oranagra

A flag can be confusing, but on the other hand sometimes a API that looks simple is worse than a confusing one if the simple one is misleading. I.e an extra argument can force people to read and understand the consequences.. However a flag will probably be overlooked. So I agree.

The only thing then that maybe worth another thought it adding another type, one meant for modules that want to use plain null terminated C strings (non binary safe). Maybe that'll solve the problem for some modules, and be more convenient...?

Comment From: tezc

I initially thought it would be confusing for module developers to decide which type to use but maybe it's not that bad. As you said, modules can use that type for C string configurations. For other types (buffer or non-null terminated values), we can document and suggest using RedisModuleString.

Comment From: tezc

@madolson What do you think about adding another type for null terminated strings?

Comment From: madolson

@tezc I don't agree with it. I didn't respond earlier, but the design I outlined isn't a workaround, it was a deliberate design to encourage users to not have to have "serialization" and "deserialization" code for their configs.

If feels like a large push to optimize for a use case that is trivial to handle, a RedisModuleString contains a null terminated C string. All you have to do is extract it while using it.

EDIT: For the main Redis repo, I want to get rid of all of the non-sds strings and make all of the strings more normative so it's much easier to reason about.

Comment From: madolson

@tezc I don't really mind the the approach where we add a flag that changes the behavior, that seems like a really minor change though.

Comment From: tezc

@madolson I didn't like that flag solution. So, I'm not pushing for it. We'll go with a workaround in our module. Maybe we revisit this later if we hear more about this issue.

Comment From: iddm

@tezc I don't agree with it. I didn't respond earlier, but the design I outlined isn't a workaround, it was a deliberate design to encourage users to not have to have "serialization" and "deserialization" code for their configs.

If feels like a large push to optimize for a use case that is trivial to handle, a RedisModuleString contains a null terminated C string. All you have to do is extract it while using it.

EDIT: For the main Redis repo, I want to get rid of all of the non-sds strings and make all of the strings more normative so it's much easier to reason about.

I can't express how wrong I think this is. A module can, need, want, you name it, simply store an integer or a floating point value for its internal purposes, but the user-facing options might be preferred to have as strings or any other "serialised" way. Not to mention that the current API assumes the floating-point numbers don't exist in this universe, meaning we constantly have to fight the Redis server API to work around the problem and serialise those to a string and back. However, this leads to a memory leak in this case. You artificially limited the user (the module developer) so much, that we need to fight with the API rather than use it.

This is a must-have, as well as many other things w.r.t. this "API".

P.S. We don't store the configuration options for the user, we store those for the module and its operation, it is internal to the module. The user may be allowed to see the options, but the user shouldnt' dictate how the options are going to be stored, not to mention what types are allowed to be stored (see the non existing floating-point numbers again).