Redis Support rollback in Lua script

The problem/use-case that the feature addresses

My company uses Redis as a main database. Lua script is heavily used to execute complicated business logic on Redis to provide atomicity. The script reads and writes a lot of data. As the business grows, the Lua script size keep growing and become more and more complicated. In addition, the script could abort due to some conditions not met in the middle of execution. For the changes that have already been made by the Lua script, Redis does not rollback them automatically. It is very hard to use Lua script without automatic rollback. We are wondering whether other people have met the same problem.

Description of the feature

Add a new command "abort" to abort the script explicitly and automatically rollback all the changes made in the Lua script.

Alternatives you've considered

Currently we have considered 2 solutions to solve this. But none of them is easy to maintain at scale. 1. Separate the read/write logic. Perform all the read operations at the beginning of the script, check all conditions are met, then perform write operations. This avoids transaction aborted half way through. However, it is hard to enforce this with code review process as the team scales. It is also hard to maintain in long term as business logic are separated by read half and write half.

Keep tracking the changes that have been made in the script so far, and rollback them on abort. This approach is hard as well. It is hard to maintain due to the complexity. E.g. for hset command, we need to read the current value before execute it, so that we can roll back to the old value on abort. It is very easy to miss some of the changes, as the team grow. In addition, the write operation and rollback operation causes unintended side effect like key space notification. Ideally, key space notification should only happen when transaction succeed.

Additional information

Any additional information that is relevant to the feature request.

Comment From: madolson

@xingbowang Definitely not a trivial feature to implement, but Since lua scripting is Redis' form of "transactions", this seems like something we should at least spend some time thinking about it more detail. I agree it is hard to orchestrate rollback yourself, it would be nice if there was some way to 'mark' a lua script as something that can be aborted, presumably with our # flags we've been adding, and then allow users to pay the cost of us maintaining the log of changes. Simple transaction cases like optimistic locking, where you set a hash field value to a record version and then only update if the value hasn't changed, is trivially covered by your solution outlined in option 1. So let's assume we're only targeting complex transactions where it's not possible to validate assumptions upfront.

We had a past PR for this, https://github.com/redis/redis/pull/2701, which is a bit naive in that it dumps the content of the keys to be able to restore everything. I think we could optimize this more based on the operations done (for example keeping track of the inverse mutations on the key).

@yossigo One thing we might want to consider is consider this a candidate for first party modules, in allowing custom datastructure implementations that have advanced functionality like this. I'm not all that convinced about the rollback case, but I think there is likely cases to be made for companies wanting custom data structure implementations (RL and AWS's data tiering come to mind readily).

Comment From: hpatro

We should also think the same for command execution within MULTI/EXEC block and cover operational commands as well. customers might have orchestration while applying series of ACL commands and have custom behaviour around the success/failure of the command execution.

Comment From: yossigo

@madolson I'm thinking about this issue in terms of MVCC and undo log, which is not trivial (big understatement). I'm not sure I understand your last comment in this context though. Do you refer to first party modules that provide alternative implementations to core data structures (e.g. an alternative to hashes / lists / etc.)?

Comment From: xingbowang

@yossigo I like the idea on MVCC and undo log. I have some familiarity with it from my previous interaction with Innodb, which supports MVCC by cloning the entire key-value pair record. As Redis data type include collections(hash, list, sorted set, etc), which could have very high memory footprint, it could be very expensive to create a full copy of the collection value.(I believe this is the method used in the issue @madolson found). Meantime, it has much lower memory cost to track undo/redo operation between different versions. E.g. for "hset key field new_value" command, the undo operation will be "hset key field old_value", or "hdel key field". The challenge would be defining undo operation for each write command. With close to 100 write command in Redis, this would be a huge effort. Maybe we could reduce the effort by supporting the most popular data structures first, say(string, hash and list).

Comment From: madolson

@yossigo Yeah, two intermingled thoughts. I had the same concern about MVCC, in that it will add a lot of complexity, and for a feature that isn't that important for the current incarnation of Redis. The architecture is single threaded, and we don't really need multiple versions if we atomically are executing each block of code. It might be more important for a future architecture though, where we have more concurrency.

My next thought was that we could support a naive version of MVCC/rollback in LUA, where we just copy all of the values accessed by a LUA script and then the script acts upon those cloned objects. If someone wanted to implement something more advanced to save on memory, as @xingbowang outlined, we could enable it via first party module by allowing individual to build custom functionality within the data-structures to keep track of the multiple states. We would still need to think through how the MVCC state would look from Redis side, but we wouldn't need to actually implement the optimal version within the data structures. My perception is that this MVCC state will have a cost, either memory or performance, that most people don't want to pay, so being able to add a plugin for it might make sense.

Comment From: xingbowang

On a second thought, I think MVCC could be useful as Redis module support more and more data model. Redis module supports so many new data models, Json, Search, time series, Graph. Some of the queries are going to be super expensive(E.g. graph query) in terms of CPU time. Meantime, as customer store more data in memory, the instance get bigger. The bigger instance will have more CPUs available. There will be more demand on multi-threading as Redis expand its functionality. It will help improve parallelism increasing overall Redis throughput and reducing latency.

Comment From: madolson

@oranagra @soloestoy Any input on this? I will tentatively add this to "next major backlog" as an item to consider. The feature has come up in the past, and if we can come up with a good implementation it might be worth implementing.

Comment From: oranagra

I agree we need to think about this again for the next version of redis, maybe together with some multi-threading improvements.

but also must state that my gut feeling is that the overheads such a feature will add are not worth for the pain it comes to solve. i have a feeling that although the current approach is not "mathematically" air-tight, and there are cases in which we could break atomicity, in practical terms, it might be good enough for users (they can avoid the issues, and that even got better in 7.0 with the function flags), so adding memory and performance overheads, not to mention complexity, might be wrong.

Comment From: yossigo

@madolson I agree this is something to consider, maybe in a bigger scope of interactive transaction isolation and not necessarily just inside Lua.

Comment From: xingbowang

As @soloestoy mentioned in #10804, the atomicity guarantee provided by Redis can be confusing for some of the customers. Redis made a conscious decision on the trade off between atomicity guarantee and implementation complexity long time ago. Looking back, it was a right decision. It allows Redis to grow fast. This happens on other database as well. E.g. MySQL Innodb, Innodb introduced MVCC on MySQL 5.6 at 2013, 12 years after Innodb was first released at 2001. As Redis continuously grows, more and more people uses it as their main database storage beyond caching purpose. We need to re-evaluate the decision made before based on the new information we obtained from our customers. I agree that this will increase complexity and cause more memory/CPU overhead. These feedback will guide us explore the solution spaces and help us make right decision. I believe as an in-memory first database, it would greatly expand Redis use case in long run if we could support MVCC.

Comment From: madolson

@xingbowang There was some consensus that a feature like this would be useful but obviously comes with a lot of potential tradeoffs, both unknown and known. The ambiguity here is very high, so a next step would be to implement a PoC or some design that can be evaluated more critically.