Redis Scripts should be replicated only by effects.

Hi all, especially @oranagra, @soloestoy, @yossigo (and possibly others) that in the past dealt with related issues.

In the latest days I tried to find what is the link between different issues that we had in the past, and recently since the introduction of PSYNC2. As well as the issues you recently fixed while I was in vacation, regarding the "DEL before script" issue. I made certain considerations, together with a general willingness that I've about the future of Redis: when we discover bugs, I think we should not just fix them, but understand if they are due to a design error.

All those considerations make me think that there is a design error in Redis currently. I believe that we should only allow effects replication, and totally remove the previous mechanism to replicate the script verbatim. The major drawback of this, is the use of more bandwidth. Performance-wise however this may turn to be at least as fast as replicating the script as we do now, especially since @soloestoy fixed the problem with large pipelines. Dispatching commands from the network should not be slower than the Lua invocation of commands.

The amount of simplification possible after this change is incredible, especially now that also the problem of slaves expiring keys is fixed, it means that slaves are more and more completely dependent on the master stream of commands. So we could gain:

No longer need to explain to users side effects into scripts. They can do whatever they want.
No need for a cache about scripts that we sent or not to the slaves.
No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed.
No need to store scripts inside the RDB file in order to startup correctly.
No problems about evicting keys during the script execution.

Many of the points above resulted into bugs in the past, it's not just a matter of code complexity, but also a problem with future bugs that we risk to have every time a new feature may have an interaction with the scripting feature. To execute a complex script into a different instance and end with exactly the same dataset is an hard problem in general...

I believe that we should boldly go to that direction, and this should be just the start of a series of simplifications that should be made to Redis key mechanisms in order to reduce the complexity and improve the long term ability to keep it bug-free.

Comment From: itamarhaber

:+1:

Comment From: catwell

For what it's worth, I agree. If you can make scripting in replicated contexts as powerful as single-instance and replication code much faster, I think it's worth using a little more bandwidth.

Comment From: soloestoy

No longer need to explain to users side effects into scripts. They can do whatever they want.

No need for a cache about scripts that we sent or not to the slaves.

No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed.

No need to store scripts inside the RDB file in order to startup correctly.

No problems about evicting keys during the script execution.

Hi @antirez , I think we should take a new propagation mechanism for Lua scripts, and I have implemented in my way in PR #4966, I think it can fix point 1 and 2, maybe including 4.

About point 5, in PR #5250 we have a lot of discussion, and now I updated the PR by removing freeMemoryIfNeeded.

And some other fixes about inconsistency by lua script, like #5208: forbid non deterministic WRITE commands, #4835: mark some commands as random.

Comment From: soloestoy

And here is the main idea in PR #4966 A new propagation mechanism for Lua scripts:

My main idea is to implement a new propagation for Lua scripts, the current Lua scripts propagation is a bit complicated and there is a waste of memory and cpu time.

Firstly, let's see the current Lua scripts propagation mechanism(here we ignore redis. replicate_commands for the time being):

When does redis propagate Lua scripts explicitly?

SCRIPT LOAD command propagates Lua script to slaves and aof, and adds the script to server.lua_scripts.

At the same time it means that we can apply EVALSHA on redis even after rebooting or failover.
When does redis propagate Lua scripts implicitly?

i. EVAL command with WRITE Lua script.
- Then redis propagates the Lua script to slaves and aof, and adds the script to server.lua_scripts as what SCRIPT LOAD does.
- Problem 1: we cannot propagate READ Lua script by EVAL, and then we cannot apply EVALSHA after rebooting and failover.
ii. EVAHSHA command with new sha.
- As we know the new sha is not in server.repl_scriptcache_dict(but it is already in server.lua_scipts), so we have to rewrite EVALSHA as EVAL with the original script, and propagate it to slaves and aof.
- Problem 2: we have to propagate the original script even it is already propagated by SCRIPT LOAD or EVAL, it's unnecessary.
- Problem 3: we have to propagate the original script even it's a READ script like redis.call('keys','*'), it's a waste of time that a slave execute EVAL "redis.call('keys','*')" 0 or master load from aof.
- Problem 4: replicationScriptCacheFlush will be called when aof rewrite and full resync, it means that all shas called by EVALSHA will be flushed then all EVALSHA have to propagate again.
- Problem 5: old sha will be evicted when we reach max number of server.repl_scriptcache_fifo entries.

But remember that we already persist all Lua scripts into RDB file after Redis 4.0, so I'm thinking can we implement a new propagation for Lua scripts to make it simple and effective?

I think maybe Lua scripts should be regarded as a part of state of the instance as well. We can take the Lua scripts as data that not belong to any DB but belong to the whole instance. And then what I did is that:

Propagate EVAL command as SCRIPT LOAD when it's the first time we meet this script.

It means that even READ scripts can be propagated to slaves and aof by EVAL command, and slaves could be same with master state.
Disable SCRIPT FLUSH|LOAD and EVAL command on slave.

Because these two commands may break Lua scripting state.
Persist all Lua scripts in AOF and RDB file.

Then slaves could be same with master's state and AOF file contains all Lua scripts even after rewrite.
Remove server.repl_scriptcache_dict and server.repl_scriptcache_fifo.

We can guarantee that even after FULL RESYNC or AOF REWRITE redis can hold all Lua scripts, so these two are not needed any more. Scripts will not be propagated duplicated and slave will not execute READ Lua script from master.

Comment From: antirez

@soloestoy thanks for the comments, all the info make sense, but IMHO do not reply to the original question. If we can avoid at all having a script replication mechanism, why we should bother at all?

Comment From: soloestoy

@antirez sorry for that... just for reference

Comment From: antirez

Oh don't say sorry at all @soloestoy! What I mean is that I'm very interested in your opinion about the negative impacts of having just effects replication.

Comment From: yossigo

Hi @antirez this move makes perfect sense to me, but I'd suggest to do it in two steps: for some time make effect replication the default, but leave an option to undo that if it turns out it breaks existing Lua code due to bandwidth, etc.

Comment From: soloestoy

Hi @antirez , I can understand your idea, we can set lua_always_replicate_commands to be 1 and remove the script cache server.repl_scriptcache_dict and server.repl_scriptcache_fifo, it means that only write commands can be propagated to slave and aof wrapped with MULTI/EXEC as transaction, then all things come true.

Actually when I tried to implement a new Lua scripts replication mechanism, I wanna do it like what you said at first, all things perfect but only one problem:

Normally people call EVALSHA after SCRIPT LOAD and everything perfect, but some naughty boys don't do that, they call EVALSHA afeter EVAL and it can work well, like the test case in scripting.tcl 😂 :

    test {EVAL - is Lua able to call Redis API?} {
        r set mykey myval
        r eval {return redis.call('get',KEYS[1])} 1 mykey
    } {myval}

    test {EVALSHA - Can we call a SHA1 if already defined?} {
        r evalsha fd758d1589d044dd850a6f05d52f2eefd27f033f 1 mykey
    } {myval}

And if we just open lua_always_replicate_commands and remove script cache, the sha not generated by SCRIPT LOAD but by EVAL cannot work after reboot or failover.

At last, if we don't care about the problem, OK all things perfect. But if we care about it, I think PR #4966 is a good choice, only one thing different with your idea is that I rewrite EVAL as SCRIPT LOAD + EVALSHA.

Comment From: oranagra

@antirez this is a brave move, and it is hard to predict if any use cases will be badly affected this and how much. so as much as i want to just trim complicated code from the sources, i support what Yossi suggested and start by just making this the default, for two obvious reasons:

if some use cases will be badly affected they'll be able to just change the default, and no need to switch to old version or wait for fix.
if we'll find out it causes too much damage in many use cases, we won't need to revert tons of deletions and mess with merge conflicts.

so in that case, it looks like it leaves us with all the problems, and actually doesn't solve anything, since all the problems script propagation cause remain. and we're left with two options: 1. try and solve these problems for users who will decide to disable command propagation. 2. document the limitations (which have always existed), and fix nothing. 3. some compromise between the above, by just fixing the ones that are easy to fix, or the ones that cause the most damage. for instance, we can decide that scripts that use commands propagation call freeMemoryIfNeeded, and scripts that use script propagation, don't do that at all.

Comment From: antirez

@soloestoy fortunately clients should always expect to receive an error when calling EVALSHA, and in that case, they should just re-send the script, so we don't break any existing protocol from that POV at least. After all before we used to persist scripts into the RDB file, anyway a reboot of the master created the same situation.

@yossigo @oranagra yes I agree that this should be made at a smaller step than doing it immediately... I've the feeling that it will not create any problem at all, but I want to run a few tests as well, but my idea is that we could do the following:

Enable scripts replication for Redis 5 by default, and make it impossible to revert, but leave all the code untouched, so that we can roll-back if needed.
If things go well and nobody has issues, remove the useless code from Redis 6.

If instead there are issues, we revert just the few lines that are needed to force the effects replication.

Comment From: oranagra

I like the fact that we won't need to fix problems in code we want to abandon, so that by making it dead-code, the bugs are gone and there's no need to worry about eviction (#5250), random commands (#4835) and so on.

but on the other hand, if some users will find this unusable after upgrading, they'll be forced to downgrade (which they probably can't due to rdb format).

maybe leave it as a deprecated feature with a warning about the various problems it has?

p.s. if a user after upgrade decides he wants to revert, it probably means that his use-case didn't have any of the script replication problems (since he was using redis 4.0 so far). so in that respect, it is ok to leave a deprecated feature full with "old" bugs.

Comment From: antirez

@oranagra sorry I gave the wrong impression with my comment, what I mean is that even if the code is hidden and the code path no longer reachable, we still fix all the issues like if it could be usable, but those are just theoretical fixes that will be used only if we revert. Then, if we do not revert the code, we delete everything. I agree that it's a bit an uneasy position to leave the code for potential reverts not handling anymore the corner cases it could have.

Comment From: soloestoy

fortunately clients should always expect to receive an error when calling EVALSHA, and in that case, they should just re-send the script, so we don't break any existing protocol from that POV at least.

OK, this is our luck, and then I'm thinking about that: how about just take EVAL as EVAL not as SCRIPT LOAD + EVAL, it means that EVAL should not affect server.lua_scripts dict.

See commit 9065f2848d8c648ffe828e6dc2167a7af1b6652e.

Comment From: antirez

@soloestoy this would be against the fundamental design of the scripting feature. SCRIPT LOAD is here just as a commodity, but the main mechanism from the POV of the client is the following: you are free to call EVALSHA, if it does not work, you revert to EVAL to execute+store, and later you can try to call EVALSHA (without any guarantee anyway). One of the important things in this semantics is that there are no guarantees that you are "storing" a given script on the server cache.

Comment From: yossigo

@antirez I think this is a good opportunity to relax the guarantees about the lifetime of a script. Currently this is what the documentation indicates:

Executed scripts are guaranteed to be in the script cache of a given execution of a Redis instance forever. This means that if an EVAL is performed against a Redis instance all the subsequent EVALSHA calls will succeed.

I think the way you've described it above is better, so a client must make no assumptions about the cache and should always be ready to fall back from EVALSHA to EVAL, even in the context of the same run/connection/whatever.

Comment From: antirez

Hello @yossigo, I think that we should give such guarantees from the POV of the local connection being still active (but not across instance reboot / failover), because otherwise there is no way to make sure that MULTI/EXEC with scripts and EVALSHA are sane...

Comment From: yossigo

@antirez of course it's needed inside MULTI/EXEC. I argue that clients shouldn't assume a script is not gone just because the connection is still open, for a few reasons: 1. Proxies. 2. It takes away the liberty of managing script cache resources at will, or at least makes it more complex than it should. 3. From the client's perspective it probably makes no real difference.

Comment From: antirez

@yossi I was not clear. If we relax the contract that in a given connection the script will be persisted, you can't do the following:

SCRIPT LOAD ...
MULTI
EVALSHA ....
EVALSHA ...
EXEC

Right now that's the pattern to make sure the transaction will not fail because the script goes away.

Comment From: yossigo

@antirez that's a good one. That makes the documentation a bit inaccurate. When a proxy is present it would also require explicit replication of SCRIPT LOAD (and EVAL which would translate to SCRIPT LOAD towards the slave).

Comment From: antirez

@yossigo documentation-wise what we could do, is to relax the script caching guarantees and just say that a script will be always remembered only in the context of a given connection, even if the actual guarantee currently is that actually the scripts are persisted as long as a given instance is still alive and not restarted (and the script cache is not reset manually). However from the POV of the client, to check if it is still the same instance that was not restarted is hard, so it makes more sense to relax the guarantee itself to what is trivially client-verifiable IMHO. And after all the "same connection" rule covers the above pattern. Even since, anyway, Redis connections are pretty stateful, so on reconnection the client should know that a number of things potentially changed (selected DB, MULTI state, ...).

Comment From: antirez

Effects replication is now the default in Redis 5 RC5.

Comment From: yossigo

@antirez it's a good contract definition, also leaves the door open in the future to garbage collect scripts if we must do that (if we make sure their conns are gone). Can't really see how this becomes a priority, though...

Comment From: oranagra

handled #9812