Use Case
In a forum / social media application, there may exist a feature to get the last 100 users who posted a message / thread. One can implement it with ZADD, use it like: ZADD homepage_threads unixtime thread_id.
Later, we can retrieve the most recent data with ZREVRANGE homepage_threads 0 99.
Addressed Problem
This perhaps won't becoming issue for small-medium scale application. However, as the application is getting bigger, serving billions of threads every hour. Such use case, getting the latest 100 threads are becoming expensive because we need to store data to the system every time we got new thread. One can do workaround by removing the data before adding the data. But, this is adding more complexity on how things can be handled.
Feature Request
It would be better if we could provide a way to configure the maximum capacity of the ZADD.
However, this may not be easy as it seems. Because setting the maximum capacity meaning what the corrective actions that we can do when it hit the limit.
Potential Solutions
There are 2 alternatives that I can imagine how we can achieve this:
Alternative A
Adding ZRANGECAP and ZRANGECAPINFO.
Let's consider the following Redis command specs:
- ZRANGECAP key start stop [REV] max_capacity
- ZRANGECAPINFO key
Below how things will look like:
redis> ZADD homepage_threads 1689428699 "thread_id_one" 1689428700 "thread_id_two" 1689428701 "thread_id_three"
(integer) 3
redis> ZRANGECAPINFO homepage_threads
1) "+Inf"
redis> ZRANGECAP homepage_threads 1 2 2 // set maximum capacity to 2 records
1) "ok"
redis> ZRANGECAPINFO homepage_threads
1) "2"
redis> ZRANGE homepage_threads 0 -1
1) "thread_id_two"
2) "1689428700"
3) "thread_id_three"
4) "1689428701"
// since we only have maximum capacity of 2, but we want to add more..
// the proposed solution here is to remove the key that has lower rank.
// So, that means 1689428700 will get removed
redis> ZADD homepage_threads 1689428702 "thread_id_four"
(integer) 1
redis> ZRANGE homepage_threads 0 -1
1) "thread_id_three"
2) "1689428701"
3) "thread_id_four"
4) "1689428702"
Alternative B
Creating different data type that works identically with ZADD and etc but with setting maximum capacity as a mandatory option. The purpose of this solution to differentiate the user with existing available solution.
Goal
The above feature request main goal is to improve the Redis efficiency by adding capacity limit how much data that we want to store.
The above use case is just one of the many available use cases where we want to limit Redis capacity not to store everything but we only want some parts of the data and the nature order of data is always the same like the above use case.
As a side note, the above can be expanded to other data structures like SADD, SET, LIST. In author's opinion, this will be useful in someway, and anticipate Redis resources better by capping the maximum capacity.
I hope sincere feedback from the Redis Team.
Thank you.
Comment From: zuiderkwast
As you said, there are no limits for other data structures nor for the total number of keys. I think such config can break certain applications and is not desirable.
Perhaps we can add a trimming argument to ZADD, similar to the MAXLEN argument for XADD? XADD and ZADD are similar, but for different types of keys.
However, this may not be easy as it seems. Because setting the maximum capacity meaning what the corrective actions that we can do when it hit the limit.
Instead of trimming, maybe the ZADD (with MAXLEN argument) can fail with an error if the maximum size has been reached?
Comment From: rhzs
As you said, there are no limits for other data structures nor for the total number of keys. I think such config can break certain applications and is not desirable.
Perhaps we can add a trimming argument to ZADD, similar to the MAXLEN argument for XADD? XADD and ZADD are similar, but for different types of keys.
However, this may not be easy as it seems. Because setting the maximum capacity meaning what the corrective actions that we can do when it hit the limit.
Instead of trimming, maybe the ZADD (with MAXLEN argument) can fail with an error if the maximum size has been reached?
@zuiderkwast thanks for your feedback. We can certainly add that as well. Interestingly I also found @madolson feedback on other issues here how things can be implemented by adding new command.
Please also do consider to add INFO on the data type after we implement capacity restriction to certain data structure. We can add command like my suggestion ZRANGECAPINFO suggesting unique command per data type, or we can add general INFO for all data types in the future.
As for general implementation to see the data structure key information, we can perhaps do the following:
redis> COMMAND KEYINFO homepage_threads
1) "datatype"
2) "sorted_set"
3) "max_capacity"
4) "2"
5) "consumed_memory_in_bytes"
6) "1024"
7) "last_added_at"
8) "RFC 3339 datetime format"
9) "last_modified_at"
10) "RFC 3339 datetime format"
11) "created_at"
12) "RFC 3339 datetime format"
How do we proceed with the above proposals? What's the guideline? Do we need a quorum from maintainers? (sorry I am new here)
Comment From: oranagra
i didn't read the details too deeply, but maybe Redis Functions is the right solution (instead of a generic command)
Comment From: rhzs
i didn't read the details too deeply, but maybe Redis Functions is the right solution (instead of a generic command)
The addressed problem section explained the problem with current implementation. This includes Redis Function / multi/exec / pipeline approach.
Comment From: madolson
I also like the idea of a MAXLEN argument, ~~with two caveats~~. ZADD is not technically constrained since we can differentiate the score (which is numeric) from a new flag. Maybe something like:
ZADD key [MAXLEN 100|TX|TN] score
If you exceed max length it throws an error. if TX if provided, it trims max, if TN is provided, it trims the min value.
i didn't read the details too deeply, but maybe Redis Functions is the right solution (instead of a generic command)
Functions could also implement this, but this use case seems more generic than most that we hear around this.
Comment From: zuiderkwast
setting maximum capacity as a mandatory option
Why do you want this to be mandatory? Your Redis admin doesn't trust the app developers? If this is why you want this, then I think Functions can be a solution. Using ACL, you can forbid the users to call commands directly and only allow them to call functions which are pre-defined by your Redis admin.
How do we proceed with the above proposals? What's the guideline? Do we need a quorum from maintainers? (sorry I am new here)
If would be good to get some understanding from at least one of the maintainers before you write the implementation and submit a PR.
Another question: Can you use a stream instead of a sorted set? A stream has entries sorted by timestamps and it is easy to get the last 100 entries using XREVRANGE. You can add an entry and trim the stream at the same time using XADD with MAXLEN. It seems like a good fit for your use case.
Comment From: madolson
Why do you want this to be mandatory? Your Redis admin doesn't trust the app developers? If this is why you want this, then I think Functions can be a solution. Using ACL, you can forbid the users to call commands directly and only allow them to call functions which are pre-defined by your Redis admin.
Maybe we should introduce some extension of ACLs which defines policies like this. There are a lot of things like: 1. Max string size 2. Required arguments 3. Conditional arguments
Which could be useful to set. In AWS we have this system called "condition/context keys" which are often set throughout an organization. https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html Maybe something similar to that would be useful to us.
Comment From: rhzs
I also like the idea of a MAXLEN argument, with two caveats. ZADD is not technically constrained since we can differentiate the score (which is numeric) from a new flag.
@madolson I see.. By making it an option during append operation, this would mean that we can resize the data structure anytime. Could you also consider to make immutable data size? That's just my thought.
Which could be useful to set. In AWS we have this system called "condition/context keys" which are often set throughout an organization. https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html Maybe something similar to that would be useful to us.
Yes, this definitely help to set the capacity in global scale / throughout the app.
@zuiderkwast
Why do you want this to be mandatory?
There are some areas why I think this is going to be useful: * Mitigating abusive clients * Better admin management * Achieving efficiency where we want to keep only N number of data in data structure.
Your Redis admin doesn't trust the app developers? If this is why you want this, then I think Functions can be a solution. Using ACL, you can forbid the users to call commands directly and only allow them to call functions which are pre-defined by your Redis admin.
If my understanding is correct, ACL can only allow / disallow, you can't limit "the capacity of the data structure" or "the capacity of per key per data structure".
If would be good to get some understanding from at least one of the maintainers before you write the implementation and submit a PR.
Got it, thank you.
Another question: Can you use a stream instead of a sorted set? A stream has entries sorted by timestamps and it is easy to get the last 100 entries using XREVRANGE. You can add an entry and trim the stream at the same time using XADD with MAXLEN. It seems like a good fit for your use case.
Yes, it is true that the above use case might be implemented using XADD. Consider the other use cases in gaming leader scoreboard where we want to show only top 100 players, or you are in ecommerce where you want to show the top 100 most sold product variants from 100 millions of product variants. I think ZADD might be a good fit in those use cases, because the ranking can't use timestamp as a key.
Comment From: oranagra
setting a per-key configuration is not the redis way. i agree the MAXLEN argument can be a solution, but i think a simple function is a better solution, and leave the user more flexibility to do things a bit differently or also update other data structures at the same time (i.e. the function would be named "AddThread", not "ZADDandTRIM"). As far as i can tell, the function just needs to to ZADD+ZCARD+ZREMRANGEBYRANK. am i wrong?
Comment From: zuiderkwast
If my understanding is correct, ACL can only allow / disallow, you can't limit "the capacity of the data structure" or "the capacity of per key per data structure".
@rhzs You're right, but you can allow only FCALL and disallow all other commands. The users cannot create their own functions. They can only call the functions which are created by you (the admin) and there are no other functions they can use.
Comment From: rhzs
setting a per-key configuration is not the redis way. i agree the MAXLEN argument can be a solution, but i think a simple function is a better solution, and leave the user more flexibility to do things a bit differently or also update other data structures at the same time (i.e. the function would be named "AddThread", not "ZADDandTRIM"). As far as i can tell, the function just needs to to ZADD+ZCARD+ZREMRANGEBYRANK. am i wrong?
You are right. Your solution what I may have in production as well. My above proposal is more on enhancing the existing Redis capability on SADD sorted set data structure. I can argue that we also should not have MAXLEN param in XADD data structure, you can use XTRIM to cap the data, using the same approach that you suggested.
What does it impact to the Redis users like me if we have simpler way to do it? From what I see the simplicity outweigh the action that we can do to achieve similar thing. If we can have similar optional param like MAXLEN from XADD, I think it is ok to have it in SADD as well.
@rhzs You're right, but you can allow only FCALL and disallow all other commands. The users cannot create their own functions. They can only call the functions which are created by you (the admin) and there are no other functions they can use.
The proposal goal is not allowing X / Y commands per say or do workaround/monkey patching to fit the use cases, but rather to improve the overall Redis capability of capping "per key" data structure size/capacity.
Comment From: madolson
The users cannot create their own functions. They can only call the functions which are created by you (the admin) and there are no other functions they can use.
Worth mentioning that you can't scope down FCALL like the way you are mentioning either. The way ACLs work require the user to support all of the commands that the function is calling, so you can't prevent them from misusing the APIs.
From AWS, we have overwhelming seen users creating the functions as well. I don't really think the difference we make between "admins' and "users" is as common as we make it out to be.
Comment From: oranagra
What does it impact to the Redis users like me if we have simpler way to do it? From what I see the simplicity outweigh the action that we can do to achieve similar thing. If we can have similar optional param like
MAXLENfromXADD, I think it is ok to have it inSADDas well.
I don't have an objection to add that feature to SADD if it is designed to be generic enough, and there are enough use cases to justify that. On the other hand, it it becomes too complex since it needs to support too many different options for too many different use cases, then maybe scripting is the right solution (allows more flexibility at a cost of some extra code in the app).
just to note that also, adding more and more options to an API can sometimes create confusion.
Comment From: someview
setting a per-key configuration is not the redis way. i agree the MAXLEN argument can be a solution, but i think a simple function is a better solution, and leave the user more flexibility to do things a bit differently or also update other data structures at the same time (i.e. the function would be named "AddThread", not "ZADDandTRIM"). As far as i can tell, the function just needs to to ZADD+ZCARD+ZREMRANGEBYRANK. am i wrong?
We have face the same problem: need a capped zset,and if zset evits the member,we expected it return this ele with maxlen param . Now we must do this with three command is lua script. It's expensive.
Comment From: madolson
need a capped zset,and if zset evits the member,we expected it return this ele with maxlen param .
@omeview Can you describe how you are using sorted sets and whether or not what you really want is a Stream?
Comment From: someview
need a capped zset,and if zset evits the member,we expected it return this ele with maxlen param .
@omeview Can you describe how you are using sorted sets and whether or not what you really want is a Stream?
This just like a free chat room: every one can enter room freely, and the room can evit some memeber accordin to max member limits or active degree. In application ,we often using local cache to avoid get members eveny time if a member speak.So, if member changes,we should notify service the fact. we can design members like this:
zset member score
A:members JACK activeScore
Comment From: someview
This is some lua script I have writen:
local max = redis.call('HGET', KEYS[1], ARGV[1])
if not max then
return -1
end
local add = redis.call('ZADD', KEYS[2], ARGV[2], ARGV[3])
if not add then
return count
end
local count = redis.call('ZCARD',KEYS[2])
local maxCount = tonumber(max)
if count > maxCount then
redis.call('ZREM', KEYS[2], ARGV[3])
return -2
end
return count + tonumber(add)
It is highly recommended to implement cappedzset, allowing automatic elimination or rejecting the addition of new elements.