Hi,
I'm making a webapp that makes intensive use of sorted sets. Many times I need to intersect subranges of sets so I use zrevrangebyscore and zinterstore. They work great. However since I just need to intersect subranges of given sets, and not complete sets, I need to first query the subranges with zrevrangebyscore and save the results in a temporary keys. Then I can use zinterstore to intersect the temporary keys. Initially I made the first step in memory in the client application. Now I'm using this Lua script:
local t = redis.call('zrangebyscore', KEYS[1], ARGV[1], ARGV[2], 'withscores')
local i=1
while(i<=#t) do
redis.call('zadd', KEYS[2], t[i+1], t[i])
i=i+2
end
return #t/2
Then I call the script: eval script 2 original_key result_key min max
So everything happens inside redis. However it would be great to have an optional STORE option in the zrangebyscore command because the execution of this script could result in thousands of zadd calls depending on the zset size. Maybe I'm missing a better way to do this, if it is the case please point me out with a better solution. For example I don't know if it's possible/recommended to use zadd with many score+member pairs inside the lua script. This is, using the variadic functionality of this command.
Thanks, and congratulations for redis :)
Comment From: n0nSmoker
I vote for this proposal. Using zadd traversing huge sets is too slow.
Comment From: PascalLeMerrer
I'm interested in this feature too.
Comment From: FGRibreau
+1
Comment From: AllenDou
+1
Comment From: ke5aux
Working on a project that could make heavy use of this feature. +1
Comment From: gimenete
I found a way to use the variadic version of zadd. The trick is to use unpack()
local t = redis.call('zrangebyscore', KEYS[1], ARGV[1], ARGV[2], 'withscores')
local i = 1
for i=1, #t-1, 2 do
t[i],t[i+1] = t[i+1],t[i] # first we change the order of the elements in the table
end
if #t > 0 then redis.call("zadd", key, unpack(t)) end
The unpack() function adds the whole table to the stack so it is like calling redis.call("zadd", key, t[1], t[2], t[3],.......). However there are limits in the stack size so you get the error "too many results to unpack" if the table is too big. Finally I have this function that calls the variadic version of the command in slices of 1000 elements
local function massive_redis_command(command, key, t)
local i = 1
local temp = {}
while(i <= #t) do
table.insert(temp, t[i+1])
table.insert(temp, t[i])
if #temp >= 1000 then
redis.call(command, key, unpack(temp))
temp = {}
end
i = i+2
end
if #temp > 0 then
redis.call(command, key, unpack(temp))
end
end
Why 1000 elements in each slice? Well, I found that there is a DEFAULT_STACK_SIZE in the Lua source code set to 1024 http://www.lua.org/source/4.0.1/src_llimits.h.html
Comment From: alex-nexus
+2
Comment From: codigo-pl
+1
Comment From: cxbyte
+1
Comment From: trompx
+1
Comment From: jwhitehorn
+1
Comment From: michael-grunder
There seems to be a decent amount of interest in this feature, so I'll take a crack at it if the Redis devs think it's worth adding. There is a concern I suppose about making ZRANGEBYSCORE and ZREVRANGEBYSCORE too complicated, but it could be done like allowing the call to be valid in the following two ways:
Z[REV]RANGEBYSCORE key min max [WITHSCORES] [LIMIT offset count]
Z[REV]RANGEBYSCORE key min max [STORE] key [LIMIT offset count]
That being said, it would make these commands variadic in terms of return type, and I feel like this is prohibited (at least I can't think of another command that does that). A second alternative would to do this in a separate command (possibly ZRANGEBYSCORESTORE which makes sense but is really long :smiley:
Cheers Mike
Comment From: itamarhaber
+1 for adding SORT-like STORE functionality for the Z[REV]RANGE* commands family.
@michael-grunder an unknown reply type is problematic so despite the syntactic simplicity of STORE there's no escape in my mind (:)) from adding the Z*STORE equivalents instead.
Comment From: mezis
@gimenete thanks for the Lua workaround. I have a similar use case where I need to run sequences of ZRANGE* and ZINTERSTORE, and the cost of the roundtrip isn't acceptable.
Another +1 from me :)
Out of curiosity @antirez, is having a general "store the result of the query in a key" syntax something you've considered?
Comment From: ducu
+1 and thanks for massive_redis_command function
Comment From: ptrofimov
+1, @sunheehnus - super!, we are looking forward
Comment From: yura3d
+1
Comment From: ghost
Definitely +1 on this.
Comment From: andrey-tsaplin
+1
Comment From: kellengreen
+1
Comment From: ghost
+1
Comment From: rovarghe
+1
Comment From: antirez
Hello @sunheehnus, thanks for providing a patch implementing this. Currently I'm evaluating if it's a good idea or not to add those commands, may I get some feedback about why you decided to go for the new commands instead of just creating a STORE option? Perhaps since other commands have this form (with the exception of SORT and a few that use STORE)? Maybe implementing this as a STORE option we can write less code, just the one to implement the STORE optionally, so that there will be no duplication. Another reason why you followed this route, could be, optimization perhaps.
Comment From: sunheehnus
Hello @antirez ,
I used to think that, if we use the SOTRE option, just as @itamarhaber said, it may have different return type (Array and Integer), but I find that SORT command can return Array or Integer, and a big part of my patch is a copy of ZRANGEBYSCORE. :-) So maybe the STORE option is a better way to avoid the code duplication.
Comment From: consatan
@gimenete in redis 3.2.4, lua version is 5.1
127.0.0.1:6379> EVAL "return _VERSION" 0
"Lua 5.1"
unpack maximum size is 7999
http://www.lua.org/source/5.1/luaconf.h.html#LUAI_MAXCSTACK
LUAI_MAXCSTACK defined 8000
Comment From: alejg
+1
Comment From: borg286
I propose a new command
ZSTOREBYSCORE dest key min max
This option would be useful for set operations. I find I would like to perform intersections of sets, but I want to start with a subset of one by filtering by its score first, and keeping the score of the second. WEIGHTS would be useful for the latter operation. I am in favor of not returning lots of data if the result is stored in redis in some destination key.
My use case is I would like to perform joins, and sorted sets combined with redis's ability to operate on multiple keys allows for that. I want to be able to perform interesctions on sorted sets using the score rather than the elements themselves. But to do the first filtering on the index I want to be able to specify the score (zstorebyscore) and then do the intersections (zinterstore).
Comment From: tenfortyeight
+1
Comment From: efkan
+1
@gimenete thanx for massive_redis_command function.
Comment From: Carbon401
+1
Comment From: marctinkler
+1
Comment From: 7-cat
+1
Comment From: simonecarriero
+1
Comment From: KrystalXu
+1
Comment From: kilianc
+1
Comment From: christophe-spaggiari
+1
Comment From: oranagra
I'd like to revisit this, get a quick decision on the API and hopefully merge a PR for redis 6.2.
looking at the 2 PRs one adds a new z[rev]rangebyscorestore (with a lot of code duplication), and the other adds a STORE argument to the existing commands.
the problem with the second approach is that it changes the command which was read-only till now to a write command, and also the response type of the command depends on the arguments. we do indeed have several other commands like that already (SORT and GEORADIUS), but they're causing some pain (e.g. GEORADIUS_RO was later added to mitigate that).
I think the right approach is to go with a new command, and make sure the implementation share code, like SUNION and SUNIONSTORE do.
@redis/core-team or anyone else, please share your feedback.
Comment From: mp911de
+1 for z[rev]rangebyscorestore as individual command to stay consistent with zunionstore as it makes it easier to distinguish between read-write/read-only commands.
Comment From: QuChen88
I also agree that we should not change the definition of z[rev]rangebyscore command to add a write variant to it in the sub-command. It is better to create an explicit write-type command z[rev]rangebyscorestore instead. That way we avoid the confusion of having a command which is "mostly" read-only but "sometimes" can also do write operations... Another good example in Redis today which I liked, is SUNION command being read-only, while SUNIONSTORE command being the equivalent write command to store the result.
Comment From: yossigo
Strong +1 on avoiding write extensions to an existing read-only command.
As for the new command, I think we should consider disregarding current convention and use ZRANGESTORE with additional arguments to support the different variants (BYLEX, BYSCORE, REV). The reasons for doing that would be:
- Restrict number of new commands.
- Avoid consistent but long-ish names like
ZREVRANGEBYSCORESTORE. - Z-commands are already somewhat inconsistent so this wouldn't be the first inconsistency.
Comment From: madolson
+1 to ZRANGESTORE, parameterization of existing commands seems better than introducing new ones. The suggestion makes it more like ZUNIONSTORE anyways.
Comment From: oranagra
It seems we have a decision. it requires a little bit of refactoring (many exiting commands would just be aliases to one function that can handle all of them). Will obviously also needs tests and documentation.
@jonahharris do you wanna tackle it?
Comment From: jonahharris
I'm happy to. I'll start putting something together next week.
Comment From: oranagra
@jonahharris any news about this task? run into some complications or just didn't find the time yet?
Comment From: jonahharris
Hey, @oranagra. Thanks for following-up. While I had some other time commitments that came up, I did take a cursory look at this and think the approach is doable. Hoping to get to it this week.
Comment From: jonahharris
So, after playing with this a couple different ways, one thing I tend to dislike in both the set code and in my previous PR for this is specialized logic for whether to reply vs store internally within the functions - it creates a lot of branches and is painful.
Alternatively, as there's not many distinct types of replies, I've written a specialized client wrapper structure that is instantiated in the top-level generic function, which sets the callbacks accordingly - either to a straightforward client-passthrough or an internal version that is used to store the result. In this way, the main zset code is much cleaner. Likewise, from a performance standpoint, I haven't really seen any degradation. I'll push my working code soon, but was wondering what your thoughts are on this approach. Similarly, it would work for the inter/unionstore as well.
Comment From: oranagra
@jonahharris you mean that the generic command implementation gets an object with a set of callbacks which in one case is implemented as adding replies to the client, and in the other case it is implemented as something that creates a zset object which is eventually added to the db?
As long as there's no performance overhead of copying the data twice, i think that can be a good approach, i.e. uses function pointers instead of branches, and can be a little cleaner (especially if the implementation is complex and the "reply or append to object" code is repeated many times.
Comment From: jonahharris
Submitting first-cut PR, which supports the following:
ZRANGESTORE destkey ZRANGE key start stop [WITHSCORES]
ZRANGESTORE destkey ZREVRANGE key start stop [WITHSCORES]
ZRANGESTORE destkey ZRANGEBYSCORE key min max [WITHSCORES] [LIMIT offset count]
ZRANGESTORE destkey ZREVRANGEBYSCORE key min max [WITHSCORES] [LIMIT offset count]
ZRANGESTORE destkey ZRANGEBYLEX key max min [LIMIT offset count]
ZRANGESTORE destkey ZREVRANGEBYLEX key max min [LIMIT offset count]
Design Goals 1. Eliminate duplication where possible. 2. Use an identical code path regardless of whether the result is consumed by a client or should be stored.
Overview of Changes In short, a result handler (zrange_result_handler) has been introduced, which either proxies the result to the client or stores it in a destination zset - this is determined by ZRANGE_CONSUMER_TYPE_CLIENT and ZRANGE_CONSUMER_TYPE_INTERNAL, respectively. Likewise, from the perspective of refactoring, and meeting the design goals, all zrange-related argument parsing and top-level execution has been consolidated into zrangeGenericCommand.
Key Changes * Exported Functions * Current z(rev)rangeXXXCommand Functions - Each set up a client version of zrange_result_handler to return the data to the client and call zrangeGenericCommand for handling the command itself. * New zrangestoreCommand Function - Sets up an internal version of zrange_result_handler to store the result into a zset (destkey) rather than returning it to the client and calls zrangeGenericCommand with the remaining arguments as if they were executed directly. * Static Functions * zrangeGenericCommand -The top-level call for zrange, zrevrange, zrangebyscore, zrevrangebyscore, zrangebylex, and zrevrangebylex - handles argument parsing and calling the appropriate handler: * genericZrangebylexCommand - Handles specific command execution. * genericZrangebyrankCommand - Handles specific command execution. * genericZrangebyscoreCommand - Handles specific command execution.
Emission of ZRANGE-related Results The genericZrangeXXXCommand functions now use beginEmission, emit, and finalizeEmission function pointers inside zrange_result_handler to send the result set rather than the addReply calls directly. These are basically:
- beginResultEmission
- If Result to Client - sets up addReplyDeferredLen
- If Result to Store - deletes old key (if any) and creates new object/key based on a simplified strategy (ziplist/skiplist) known at the calling site.
- emitResultFromXXX
- If Result to Client - calls addReplyArrayLen/addReplyXXX/addReplyDouble the way the current code works.
- If Result to Store - calls zsetAdd
- finalizeResultEmission
- If Result to Client - completes setDeferredArrayLen
- If Result to Store - returns count of items to client and performs notifications as necessary.
Remaining Work I'm not super happy with the naming on some of it yet and, as you'll see for t_zset.c, I have to merge it back into Redis formatting - I just don't normally work with Redis code in its natural state due to its formatting inconsistencies and rather haphazard declarations/definitions strewn all about. I also need to formalize my test scripts into actual tcl unit tests.
Regardless... I've spent a day on this and wanted to get it out for people to play with and get feedback on the approach.
Major open questions * Should zrangestore ALWAYS retain scores? At present, if someone just wants the members without scores, they get them. Accordingly, zrangestore was designed to store only what the client would've originally received. This means, unless withscores is passed (which isn't even possible in bylex variant), a score of 0.0 is used. The more I play with it, the more I'd personally prefer scores to always be retained... but I can see reasons for the user not caring whether all members are zeroed. From a performance perspective, the actual score was already fetched in almost all cases, even if it wasn't returned. As such, there isn't really any overhead to storing the original score in the destination zset. I'm interested in everyone's thoughts. * Should zrangestore's third argument be the full ZRANGE command? While the earlier comment discussed BYSCORE, BYLEX REV, etc., I used the full command name only because ZRANGE/ZREVRANGE don't have a differentiating token and I didn't want to invent one or have to ascertain whether it was ZRANGE based on argument counts. I can do that though, if that's what's desired.
Comment From: tzickel
Might be a wrong idea, but maybe instead of doing per-command thingy (and not limited just to Z commands), maybe have a global STORE
Comment From: borg286
One of the use cases I was hoping for was to be able to do set operations on the scores of a subset of the zset. If we do have the STORE option, I'd want a way that lets us do this strange migration of scores into members of a set.
On Wed, Sep 30, 2020 at 10:01 AM tzickel notifications@github.com wrote:
Might be a wrong idea, but maybe instead of doing per-command thingy (and not limited just to Z commands), maybe have a global STORE command that is actually a modifier for the next command that comes after it (like CLIENT REPLY is), and it can simply put the output the result into the specified key instead of returning it to the client (and maybe some generic infrastructure can be built around that to help commands do it) ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/redis/redis/issues/678#issuecomment-701261064, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBDEAJPMLJYF5D5XXNG353SILXVBANCNFSM4ABH3SYQ .
-- Integral z-squared dz from 1 to the cube root of 3 times the cosine of three pi over 9 equals log of the cube root of 'e'.
Comment From: oranagra
@tzickel i think that such a generic mechanism will be less efficient than what we can do when we have an explicit command, and it'll also face some challenges (some combinations may be more complicated, like ZRANGE be stored into a Set type, and ZRANGEWITHSCORE into Zset type).
Anyway for we can say that we already have a mechanism in place for such a thing, which is Lua scripts. IIRC one of the main triggers for the request of s specific command is that it's more efficient than using scripts. Scripts do also let you do more complicated or specific operations like deciding to store the scores rather than the member names, and even add some explicit conditions. so i think that should cover the use case @borg286 mentioned. doing a STORE command or command-variant for this many very specific use cases sounds wrong. that's exactly why we have scripts.
Comment From: oranagra
While working on the PR i run into a few dilemmas.
First, the new command syntax can look like any one of these:
ZRANGESTORE <dst-key> <src-key> ((BYSCORE <start> <stop>) | (BYLEX <start> <stop>) | (<min> <max>)) [REV] [WITHSCORES] [LIMIT offset count]
ZRANGESTORE <dst-key> <src-key> [BYSCORE | BYLEX] <min> <max> [REV] [WITHSCORES] [LIMIT offset count]
ZRANGESTORE <dst-key> <src-key> <min> <max> [BYSCORE | BYLEX] [REV] [WITHSCORES] [LIMIT offset count]
- The first one may be formally correct but it's hard to follow.
- The second one is a bit odd since there are positional arguments after optional arguments (and that can backfire if the
<min>value needs to be"bylex". - The Third one is a bit odd since you only define the BYSCORE / BYLEX after already defining the
<min>and<max>which may in some cases contains[and((valid only for BYLEX), so this means that redis is gonna skip these positional arguments, parse all the flags before making sense of these min/max args.
i vote for the third one, but would like to hear other opinions.
so my proposal is to have the following commands (i.e. also improving ZRANGE, and deprecating all the other variants):
ZRANGESTORE <dst> <src> <min> <max> [BYSCORE | BYLEX] [REV] [WITHSCORES] [LIMIT offset count]
ZRANGE <key> <min> <max> [BYSCORE | BYLEX] [REV] [WITHSCORES] [LIMIT offset count]
now the other dilemma is about the meaning of WITHSCORES and REV in the STORE variant. 1. if the user omits the WITHSCORES flag, should it create a sorted set with all sores set to 0? or maybe create a set (unsorted)? i'm leaning towards trimming this argument and have the STORE variant always store scores. 2. the only excuse for the REV argument would be that if you provide LIMIT, then the limit offsets are applied on the reverse order. i suppose there's no harm in keeping the REV argument. 3. maybe LIMIT itself doesn't make sense in the STORE variant? i suppose it was created for SCAN-like iteration on the sorted set, which doesn't make a lot of sense in the STORE variant. maybe we want to add an AMEND argument which will amend the new key into the existing one (so it is possible to iterate on a source or several source keys) and store them into one dest key?
Comment From: jonahharris
Third option seems best - inclined to keep REV and LIMIT. Agreed on WITHSCORES being default and removed as an option. Like the AMEND idea.
Comment From: madolson
The other option I would add for WITHSCORES, is that we could store the order of the member in the response as the rank. So if the client output is ['foo', 'bar'] we would store it as {'foo':1, 'bar':2}. Omitting it sounds like the safest option for now though.
I don't really like the amend idea. The proposed rational for slowly building up the response isn't that compelling to me. I suggested that maybe limit does make sense if you're trying to cache a snapshot of a query you are running, you might limit the number of items returned in the result and only want to cache that hottest set of items. So, I would be inclined to keep rev + limit as well.
Comment From: oranagra
ok, the popular decision seems to be to keep this syntax (option number 3 above, where min and max remain positional):
ZRANGESTORE <dst-key> <src-key> <min> <max> [BYSCORE | BYLEX] [REV] [LIMIT offset count]
Drop the WITHSCORES, and keep the REV and LIMIT.