I've been thinking about how users should be loading/updating functions. One thing that struck me is that many users of Redis run Redis with either just replication or AOF but not FSYNC on every write, so they might have tail data loss. If a function was updated, it might not be immediately clear from automation that a function "update" was lost, so I've been thinking about ways we can make function loading a bit more robust. We can add an optional "version" for a function, which has two uses: 1. FUNCTION LOAD [VERSION N] [REPLACE] ... would only replace the version if the version number is higher than the currently stored version. You can then verify in your CI/CD system that all of the function versions are at least the version you put. Additionally, if no version is provided, a version is automatically generated. You can then do optimistic locking by fetching the function, getting the version, and putting it back with version + 1. This adds some safety, since you can't watch functions today. 2. FCALL function [VERSION N] ... will only execute if the version is at least the provided version. If no version is provided, any version can be called. This is more for defense against a node going AWAL and then coming back after awhile (or maybe some disaster recovery?). I've also been thinking about the ephemeral redis workloads, and a lot of them use evalsha to execute an exact version of the script. This idea could be extended to make sure if the cluster loses data, and someone else populates it with an older version of the script, you could update it with an even more up to date version of the script using the version. You do FCALL, get a version error, and then put your more up to date version.

I also think this would be useful for a future where we support consensus driven functions, since we have a way to reconcile multiple versions in flight. @MeirShpilraien @oranagra Thoughts?

Comment From: ranshid

@madolson I think that keeping a version for each library is a very good idea in general. It can allow users to track the state of the library and clients can adjust logic according to the reported version. I only think that we should consider the different use cases. 1. in case there is a need to make sure clients are using the most up-to-date version of a library (for example a bug in the library code or an important logical change) the updater of the function can also call wait to make sure this change propagated to all the replicas. I do not expect library changes to happen that often so the overhead of waiting for the replicas might not be that painful. we can also provide a function load with 'sync' flag that will make sure this command will return after being synced to all the replicas. 2. in case there are functional changes to the library (changing functions signature or adding/removing functions) usually that will also require clients to be synced with that logic. Your proposal enable clients using the new version of the library to safely fail executing an "old" library version. but there is also the case of clients using the "old" version of the library. in your suggestion you wrote:

FCALL function [VERSION N] ... will only execute if the version is at least the provided version.

I think this is problematic since there might still be clients which are using the old version of the library and can potentially fail executing some functions in case they were altered/removed/renamed. To support that the user can make the library version part of the library name which will enable supporting 2 versions of the library. this is a bit ugly and you solution basically offer a better way to manage versions - I am just thinking if multiple versions should be managed and the ability to deprecate(remote) library by version is also needed.

Comment From: oranagra

before trying to suggest a solution or respond to the suggestions, i'd like to respond to some of the arguments presented here.

it might not be immediately clear from automation that a function "update" was lost

You can then verify in your CI/CD system that all of the function versions are at least the version you put.

That's easy to do already, embed a version number in the library, and register a function that returns it.

You can then do optimistic locking by fetching the function, getting the version, and putting it back with version + 1. This adds some safety, since you can't watch functions today.

Are you referring to a case where there are two separate workloads trying to update the same library, one using a new version and the other using an old version? Doing that would be the absolutely wrong thing to do (considering the philosophy of functions, being responsibility of an admin, and not part of the app).

we can also provide a function load with 'sync' flag that will make sure this command will return after being synced to all the replicas.

the WAIT approach you mention should already be working today, no need for an extra command feature IMHO

Your proposal enable clients using the new version of the library to safely fail executing an "old" library version. but there is also the case of clients using the "old" version of the library

right that FCALL suggestion (testing >=) solves one problem, we could maybe also add <=, etc, but i think that's too complex. in theory we could suggest users to do both FCALL and LOAD together with some check atomically, inside an EVAL script, but the fact is that you can't run scripts inside scripts, so if we wanna add these atomic checks we must make that part of the command.

however, since functions are intended to provide an API for the client application, i'd expect heavy users to plan that API carefully, and introduce changes in a non-breaking way (that's one of the reason we added named arguments, so that they can have a default value, when not provided)

To support that the user can make the library version part of the library name which will enable supporting 2 versions of the library

yes, last time when we discussed adding version support, we concluded that that's what people should do if they want them side by side, or introduce non-backwards compatible changes. i.e. add a v2 suffix to their interfaces. this doesn't have to mean an additional library.. in some cases it's just an additional function interface in the same library.

the bottom line is

i think some of the use cases or considerations presented here are wrong use cases, that we don't want to promote. i think users can already expose version number from their code via a function that returns it. and since there should be only one entity responsible of loading them there's no need for updates to be atomic with the check.

i'd expect heavy users to maintain their API without breaking changes, or introduce v2 interface (in which case they'll simply fail if the function update was lost) that means that there's no need for atomic version check on execution

regarding the opening statement about losing the tail of the changes that includes an update to functions. i think there are two use cases to discuss. 1. an ephemeral case, where you lost all the data including the functions. in this case i think the user is likely to spin up a new pod with the functions preloaded. or i'll even argue that that's how they'll do upgrades of their function code (spin up new post and decommission the old one) 2. a case with persistence or replication, where just the tail of the updates is lost (let's consider RDB snapshot). in this case, since functions are part of the data, if you lost updates to the data it makes you'll also lose the function updates. i.e. you reverted your data to the old schema, and the functions you have still match the old data. a user recovering from such a case may have more to do than just update the functions, he may need to re-run some schema update function code.

Comment From: QuChen88

@oranagra I agree that we should keep things simple unless there is a valid reason otherwise. Adding versioning to FUNCTION LOAD command is complexity that is not entirely justified at this point in time if you want to discourage that kind of use-case. In that case, can you please make sure it is clear on the documentation about the recommended ways users should load functions into Redis?

Speaking of which, I am not sure that adding functions itself would bring significant value-add on top of what people can already do with Lua scripts today. It is essentially the same as Lua in principle - correct me if I am wrong. Is it really something that is worthwhile to add in Redis 7.0?

Comment From: oranagra

@QuChen88 i think it is already documented. I did try my best to stress the differences between functions and eval, what each should be used for, and more importantly what it should not be used for (some text is repeated in the Redis Programability, Introduction to Eval Scripts and Introduction to Redis Functions in order to dry to direct people to the right place, even if they came to the wrong one). maybe there's room for improvement (feel free to suggest).

Functions indeed don't really enable anything that wasn't possible with EVAL, it's just that EVAL was very uncomfortable for some use cases, and functions are here for better fit these use cases.

I think a good way to think of them is to think of modules, they're intended to provide a solid API and unlike EVAL, decouple the application from the redis-native types.

They do also provide named-arguments, which are intended to better handle API changes while keeping backwards compatibility. Also, unlike EVAL, they support the concept of libraries (so several functions can share code, with language native interfaces). so they actually do provide some advantages over EVAL, but nothing really ground breaking.

Comment From: QuChen88

OK sounds good.

Re: so they actually do provide some advantages over EVAL, but nothing really ground breaking.

I wonder if it makes sense for functions to be a separate module then instead of a core redis functionality in 7.0 (i.e. first class citizen)?

Comment From: yossigo

I agree we don't need to support multiple concurrent versions of the same function, and if that's necessary due to API changes then it should probably be reflected in its name.

The other concern is about extending FUNCTION LOAD ... REPLACE to support a more complex constraint that will allow replacement but only if supplied version is newer. As @oranagra mentioned, we envision functions to be used in a way that doesn't require that - but I think we can be flexible and recognize that there could be other ways to use them.

Because of that, I'm not strongly against adding an optional version to functions and a UPDATE-IF-NEWER kind of modifier when loading. However, I'm not 100% sure if that's a real pain point, and if that is not overly simplified and may end up missing the point in too many cases.

Comment From: soloestoy

  1. a case with persistence or replication, where just the tail of the updates is lost (let's consider RDB snapshot). in this case, since functions are part of the data, if you lost updates to the data it makes you'll also lose the function updates. i.e. you reverted your data to the old schema, and the functions you have still match the old data. a user recovering from such a case may have more to do than just update the functions, he may need to re-run some schema update function code.

I agree, and from this POV, adding version for library is unnecessary.

But I think we should provide a way for users to check the library's "version" (maybe named with identity or digest is better), just like SHA1 of lua eval script, then users can make sure if the current library is what they want easily (no need to check the whole library_code). For instance, adding a term named SHA1 in function list's reply:

127.0.0.1:6379> function list
1) 1) "library_name"
   2) "mylib"
   3) "engine"
   4) "LUA"
   5) "description"
   6) (nil)
   7) sha1
   8) xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
   9) "functions"
  10) 1) 1) "name"
         2) "noop"
         3) "description"
         4) (nil)
         5) "flags"
         6) (empty list or set)

Comment From: madolson

Summary of the core discussion: 1. The usage of FCALL <function> [VERSION N] was rejected because it's not that useful, and is likely to cause confusion since some people might want to call a function for a specific version, which is not what we want to support. 2. We want to think of FUNCTION LOAD [VERSION N] [REPLACE] as an extension of the replace argument to allow for safer atomic updates of functions. The exemplar use case considered is that there are multiple applications that want to update their functions on startup, and they want to make sure they're only pushing the library version forward. The version basically acts as a CAS or optimistic lock to make sure we only increment the version of the functions. We thought this we a useful condition.

Comment From: MeirShpilraien

Sorry for the long reply here but I had some concern about the suggested API that I wanted to raise. I also have some less related topics about function that I wanted to raise (we can decide after that they deserve their own issues and get the discussion elsewhere).

First, I am not against function versions (though I am not sure it is that important) but I do not think that the version should be part of the FUNCTION LOAD command, I believe the version should be set by the library code (otherwise users might upload the same library code with different versions, it will be hard to know which version is really loaded and it can get messy). After discussion with @oranagra we suggest the following API which will be available to function load initialisation code: * redis.set_version - will set the library version * redis.get_old_version - gets the version of the replaced library

Those 2 API's will allow the user to verify the library version and fail the loading if needed, example:

local version = 2
local old_version = redis.get_old_version()
if  version <= old_version then
    error("A newer version of the library already exists")
end
redis.set_version(version)

Setting and checking the library version is optional. If version is set it will be given on function info command. Notice that we might want to reconsider the library name, maybe this should also be part of the library code? I guess that if we look at modules then it does make sense (the module provides its own name). But this will force the library to set its name at the beginning, it could look something like this:

local lib_ctx = redis.create_library('foo')
lib_ctx.register_function(....)

I do not have a strong opinion about the name thing so I would like to hear your opinions.

Some other topics I wanted to raise (and maybe they deserves their own issues and discussions)

Function Globals

It is basically possible to create globals today, those globals are not added to the Lua global dictionary and they are only visible to the functions that are created by the library. Example:

local my_global = 'foo'

redis.register_function('f1', function(k,a) return my_global end)
redis.register_function('f2', function(k,a) return my_global end)

From the function POV, my_global is a global value that is accessible for all the functions. My question, do we encourage this pattern? should we document it? The main disadvantage (that is easy to miss by users) is that those globals are not persisted and their value will be reset after restart, maybe we should just document this disadvantage? This also related to the next topic I wanted to raise which is FUNCTION LOAD arguments.

Function Load Arguments

The use-case is basically controlling function behaviour with some load time arguments (same as modules that gets load time arguments). Imagine the example library from our documentation that adds a __last_update__ field to a hash. What if we want the field name to be configurable at the library load time instead of hard code it in the library code. I am not yet suggesting any API changes/additions, would like to first hear your opinions whether or not such feature makes sense.

Passing Information Between Function Versions

Assuming we do want to add library version, on upgrade we might want to allow passing information between the old version to the new version. If we look at the previous topic (Function Load Arguments), the old version will probably want to pass the __last_update__ field name so that the new version will continue with the same name. Again, would like to hear your opinions before proceeding with formal definitions.

Comment From: oranagra

Thanks Meir. my preferences: 1. as you suggested, i'd function version and version check be part of the code and not the command arguments. 2. i think the globals thing should be either documented with it's limitations, or if we decide it's undesired, document that we recommend against it (better than letting people guess). 3. thinking of functions as modules, it's nice to let them get load time arguments, and maybe also load time initialization, but i'm afraid it'll open the door for more things (like asking for capability to store out of key-space data, like module aux records in the rdb) 4. if we do promote the use of globals (specifically if we provide load time arguments), we must also allow the upgrade code to access the old function globals in some way.

Comment From: madolson

@MeirShpilraien I kind of like the version approach outlined, it seems better to embed it within the function itself so you don't have multiple different places to manage it. It feels clunky, since it has to be implemented by all the future engines instead of Redis (even though they don't exist yet), but I guess we are pretty entrenched in that design paradigm anyways of making lots of functions available within the script load.

i think the globals thing should be either documented with it's limitations, or if we decide it's undesired, document that we recommend against it (better than letting people guess).

Globals in functions, at least how I think about them, feels really wrong. Stored procedures typically don't have persisted state outside of what they are storing in the database. They are just functions you can call. It feels a lot like we're expanding the scope of functions deep into modules territory, which I'm not convinced is what we want to do.

thinking of functions as modules, it's nice to let them get load time arguments, and maybe also load time initialization, but i'm afraid it'll open the door for more things (like asking for capability to store out of key-space data, like module aux records in the rdb)

Again, I don't think of functions as lightweight modules, but as stored procedures. They don't need configuration on loading, since they are defined entirely by the user. I would advocate we should try to keep functions as streamlined as possible. If we get feedback people want to build functionality more akin to modules with them, we can reconsider that.

Comment From: oranagra

TLDR: keep the current globals support and document something about it, and add api for setting and getting version, and also use it in FUNCTION INFO.


I now realize that my last message started with "my preferences:", which I initially intended to list. But I ended up listing a bunch of random thoughts.

I certainly don't think that functions should persist out of keyspace data into the rdb... I was trying to point out that supporting globals, and load time arguments, leads that way...

I'm certain I don't wanna go there (rdb resistance and upgrade passing of globals), but I'm not confident that globals and startup arguments are wrong.

The fact that users can control the functions code better than they can control modules, doesn't mean that configuration needs to be hard coded. I remind us that for eval scripts, one of the worst abused pattern is to embed constants in the code rather than pass them as arguments (there it's because changing them messes up the script cache).

On the other hand, functions don't have script cache, and we also said they're supposed to be loaded by just one entity (I'll argue it's an admin), so in that case even if the application wants to set some configuration, it can't (too late). So maybe I just convinced myself that there's actually no need for startup arguments (since they'll always have to be provided by whoever loads the code, and not the one that uses it), so the constants can be hard coded.

Maybe we'll revisit that some day, but for now I think we can agree that we want to discourage using globals, but I don't think we wanna block them. Maybe users will want to cache something there, which they can re compute after upgrade. Also, maybe it's an extension of the fact the library is a unit, that is free to share script language native code between functions, so we don't care what they do inside that unit (we only control the interfaces and sandbox).

Anyway I think we can agree that the next step is to expose a versioning infrastructure that uses api calls from within the library registration code. It's much better that the version is part of the code, and not some Metadata that's provided elsewhere. In my eyes it's also better since it hides this mechanism a bit (I didn't actually want it).

Agreed?

Comment From: MeirShpilraien

@oranagra one disadvantage of our suggestion is that you can not do downgrade if the need comes, if the library code check the version and raise an error there is no way to tell it not to do it on load time (another use-case of load time arguments maybe?). Also what do you think about library name? should it also move to the library code?

Comment From: MeirShpilraien

I think that maybe a better approach here is that redis.set_version will raise an error if the there is already a newer version. but if some force replace argument was given (FORCEREPLACE) then redis.set_version will not raise an error even if there is already a newer version.

Comment From: oranagra

i'd rather let the library loading code itself do all these checks and error. if user wants to downgrade he should comment out or modify that check in the lib code. i don't like FORCEREPLACE, if anything, this campaign means that maybe we can remove the REPLACE feature.

regarding library name, since that's mandatory, i'd rather keep what we have now.

Comment From: gkorland

@oranagra I think it shouldn't be part of the code, setting such meta data in the code is just forcing all the user to copy some boilerplate code. I bet we'll find more metadata/tags we want to set in the future.

I think we should move all the meta data to the code header:

# version: 1
# name: mylibrary
# engine: Lua
# description: this is my library

Comment From: oranagra

@gkorland we thought of that too.. it's still part of the code (blob), which is good, but it means we have to parse it ourselves, which is something we wanted to avoid.

Comment From: gkorland

@oranagra

but it means we have to parse it ourselves, which is something we wanted to avoid.

If that is the right thing to do then we should do it, much cleaner

Comment From: oranagra

I should mention that we just had an internal meeting about this topic (Yossi, Meir, Oran, GuyK), we didn't reach any conclusions yet, but we are considering to remove the following arguments from the FUNCTION LOAD command, and instead put them as meta fields at the top of the function code blob: * engine * library name * description

the reason is that we think the these fields are the responsibility of the function developer, and not the admin that loads it.

Comment From: oranagra

We had another lengthy discussion about that. We're quite certain that this metadata should be part of the payload and not provided externally (name, engine, description and version, are the responsibility of the programmer to define, not the admin that loads the library).

Although the driving force for this change is versioning, we feel that we can postpone these for some future version of redis, as long as we plan for a design that can conceptually support that.

What we must do now is get rid of all the extra arguments for FUNCTION LOAD since doing that later would be a breaking change, or an awkward backwards compatible one.

we considered quite a few concerns and alternatives (packaging, and various ways of parsing the metadata), which i'm not gonna describe since the arguments in favor and against them are long. I must mention that we didn't all agree on the exact solution, but to me it's clear what i wanna do, so i'll describe that and wait for feedback / objections.

The changes i think should be done are: * remove engine, name and description from the FUNCTION LOAD command * have redis extract the engine and name (mandatory args) from a shebang comment in the payload

now we have a few options: 1. allow the shebang to include multiple key+value pairs, and extract the description from there too. 2. add another mechanism to extract additional optional fields from a second comment line (this can either be a constant # comment, or a language native comment) 3. trim the description feature from our supported set of features, and deal with this dilemma in the future.

i'm leaning towards 1, but 3 is also ok with me.

Comment From: oranagra

10500 was merged, which would probably mean that this versioning concern can be handled later with relative ease.

we discussed it in a core-team meeting and decided to take this off the table for now, and re-consider it in the future.