Redis LIMIT ZREMRANGEBYSCORE - Nineya|java/go/python

Is it possible to add a limit for ZREMRANGEBYSCORE as in ZRANGEBYSCORE?

Comment From: antirez

Hello,

why would you do that? This seems like a bad idea. LIMIT makes sense in read-only queries, for instance to do pagination, but why limiting a write? If you really want to do that just use ZRANK to obtain the rank of a given element in a given range and then use ZREMRANGEBYRANK to remove the range of items you want to remove.

Closing the issue.

Comment From: sergray

I have a specific queue implemented with sorted set. I need to pop a limited number of items from it bellow given score boundary. I want such "pop" to be atomic, so use a transaction. Since it's not possible to limit ZREMRANGEBYSCORE, I use ZRANGE WITHSCORES and ZREMRANGEBYRANK with ZADD of items with scores greater than a boundary.

Can we give this issue another try?

Comment From: pietern

You can achieve the same goal using either WATCH/MULTI/EXEC or scripting. Consider the following approach: WATCH the key pointing to the sorted set; execute ZCOUNT with your score thresholds to find out how many items can be popped; adjust the number of items to pop with your limit; use this limit to pop using ZRANGE and ZREMRANGEBYRANK in a MULTI/EXEC. Alternatively, you can wrap this logic in a Lua script for Redis 2.6.

Comment From: sergray

Would not WATCH introduce the need for retry? Sorted set is used as a queue, which is accessed by many clients concurrently, but EVAL looks like an option, but not for 2.4.

Comment From: pietern

It would, yes. You can use some randomization in the time between retries to smooth things out. Otherwise, scripting in 2.6 is definitely the way to go.

Comment From: tlhunter

I realize this is an old thread, but here's a use-case for the LIMIT clause we've encountered I think is worth mentioning.

We've implemented a job distribution system where many homogenous processes poll for work to complete. It currently works with each process periodically issuing the following command:

MULTI
ZRANGEBYSCORE jobs -inf {timestamp}
ZREMRANGEBYSCORE jobs -inf {timestamp}
EXEC

A limitation of this system is that a single process grabs every job to be completed, potentially overloading it (e.g. a burst of jobs), leaving other processes without work to do.

What we'd like to do is the following, where we implement a sliding capacity value based on the amount of work a process knows it can handle:

MULTI
ZRANGEBYSCORE jobs -inf {timestamp} LIMIT 0 {capacity}
ZREMRANGEBYSCORE jobs -inf {timestamp} LIMIT 0 {capacity}
EXEC

However the lack of a LIMIT clause on ZREMRANGEBYSCORE prevents this solution.

Our jobs ZSET can have duplicate rank values (timestamps) which I believe prevents us from doing queries based on ranks.

Comment From: mattsta

What about appending the result of your ZRANGEBYSCORE into a list then letting your job system take jobs from the task list?

Then you can ZRANGEBYSCORE -> append to list -> ZREMRANGEBYSCORE.

Comment From: Overdrivr

@antirez This issue is old but it's definitely not a bad idea, basically it consists in being able to pop the key with the most recent timestamp, only if that timestamp is above a threshold. Very useful for having multiple workers consuming the set, just like @tlhunter mentionned.

Comment From: Overdrivr

@tlhunter I am using the following LUA script as a workaround, you've probably done the same

local entries = redis.call("zrangebyscore",KEYS[1],ARGV[1],ARGV[2]);
if table.getn(entries) > 0 then 
  redis.call("zrem",KEYS[1],count[1]);
  return entries[1];
else 
  return 0
end

Called with

eval "local entr....end" 'mySetName' '' 0 thresholdTimestamp

thresholdTimestamp is the minimum required timestamp to pop a job. The script needs some stuffing to return something better than just 0, but should be enough for anyone to get started

Comment From: jacoscaz

Another vote for LIMITing ZREMRANGEBYSCORE. My use case is very similar to @tlhunter's.

Comment From: egalpin

Here's a potential extension to @Overdrivr's solution:

function table.slice(tbl, first, last, step)
    --courtesy https://stackoverflow.com/a/24823383
    local sliced = {}

    for i = first or 1, last or #tbl, step or 1 do
        sliced[#sliced+1] = tbl[i]
    end

    return sliced
end


local min_score = tonumber(ARGV[1]);
local max_score = tonumber(ARGV[2]);
local rem_offset = tonumber(ARGV[3]) + 1;
local rem_limit = tonumber(ARGV[4]) - 1;

local entries = redis.call("zrangebyscore",KEYS[1],min_score,max_score);
local lim_entries = table.slice(entries, rem_offset, rem_offset + rem_limit);
local popped = {};

for index,value in ipairs(lim_entries) do
    popped[#popped+1] = value;
    redis.call("zrem",KEYS[1],value);
end

return popped

This implements removing with offset and limit. I think of it as ZPOPRANGEBYSCORE BYLEX (since members with the same score are ordered lexicographically). So, of the items that are obtained from ZRANGEBYSCORE, remove rem_limit items starting from index rem_offset. Call signature is now:

eval "<above_script>" 'mySetName' <minScore> <maxScore> <offset> <limit>

where <offset> and <limit> are 0-based and apply to the results of the ZRANGEBYSCORE. Returns the members removed from set, or empty set if no members were removed.

Hope it helps someone!

Comment From: wavenator

This is such an easy one and can improve using Redis as a queue so much. Why wouldn't we implement it?

Comment From: oranagra

@wavenator (and anyone else watching), if i read this discussion correctly, it resolves around pairing two commands (RANGE and REM) to achieve a POP like behavior. so for that to work both need to have a COUNT argument. to be honest it doesn't sounds compelling.

but what could make more sense to me is create one POP like command that will do that can do what you want as a single command.

this could be either: 1. ZPOPRANGEBYSCORE LIMIT - invent a new command to combine these 2. ZRANGEBYSCORE REM - modify the RANGE query to be able to do deletion. (bad idea to make a read command into a write one) 3. ZREMRANGEBYSCORE LIMIT GET - modify the REM command to return the data it removed, and add a LIMIT. (this would be similar to the GET argument we recently added to SET

@itamarhaber maybe you remember anything else that can be useful to this discussion.

Comment From: wavenator

Both 1 & 3 looks promising. I tend to support number 1 more since there is no need to use a pipeline. Number 3 though is so easy to implement and test that it might be easier to approve without getting into a deeper conversation.

Comment From: oranagra

@wavenator neither of them needs pipeline or transaction. (or am i missing something?). implementation wise, they're also all the same (we'll need to refactor things to share code).

thing is that we try to avoid adding more and more specific commands for each combination of arguments, and instead add more arguments for fewer commands. see https://github.com/redis/redis/issues/678#issuecomment-670644182 (where we intend to have one ZRANGE and ZRANGESTORE that takes REV/BYSCORE/etc as arguments)

Same as the reason we added a GET argument to SET (which already some flag arguments), rather than EX argument to GETSET.

But in this case, i think it would be wrong to add a REM argument to ZRANGE since it would turn a read command into a write one. instead we wanna have just one ZREMRANGE command, that takes all the options as arguments (BYSCORE, GET, etc).

@jonahharris FYI.

Comment From: yeka

I stumbled upon this post, having similar problem. ZPOPRANGEBYSCORE LIMIT would have been a nice addition. I ended up using eval to achieve similar result:

EVAL 'local c = redis.call("ZCOUNT", KEYS[1], 0, ARGV[1]); if c >= tonumber(ARGV[2]) then c = ARGV[2] end; return redis.call("ZPOPMIN", KEYS[1], c);' 1 <key> <max_score> <limit>

Comment From: itamarhaber

My 2c:

I'd go with with a new catch-all ZREMRANGE <key> <BYSCORE|BYRANK|BYLEX> <start> <end> [GET] [COUNT count].

BYRANK could be made default, thus rendering the mode modifier optional.
The optional GET could be named POP to reflect the operation's nature. Alternatively, we could make two commands: ZREMRANGE and ZPOPRANGE, each with its own distinct return type. Although I prefer this approach, given precedents of command modifiers changing the return type in the advent of RESP3 this is less important.
The COUNT is tricky, as usual, in terms of default/0/negative. Intuitively, I'd have the default value mean "everything" like the existing ZREMRANGEBYs. Given that 0 is a measure of work (even if it ends as a noop), it seems like a negative count value should be interpreted as "everything" in this case.

Comment From: enjoy-binbin

I am implementing it, the syntax will look like (default BYRANK, count = -1 means all): ZREMRANGE key start|min stop|max [BYRANK|BYSCORE|BYLEX] [GET] [COUNT count]

I was wonder the return value when we submit the GET option arg. I suppose it will look like ZRANGE? - Do we only return the members? - or needed add a WITHSCORES option arg like ZRANGE will return member+score pairs

I suppose we will only return the members? Since we don't have: - ZRANGE BYLEX WITHSCORES (WITHSCORES not supported in combination with BYLEX in ZRANGE) - we don't support ZRANGE BYRANK or ZRANGEBYRANK, the behave kind of like BYSCORE

So no sure about the WITHSCORES arg, but it feels like there will be some usage scenarios. It does not seem to be a bad thing to add it...

Comment From: oranagra

I think it should have all the capabilities of ZRANGE, some combinations of the arguments are not allowed, but that doesn't mean we should limit others.

Comment From: classicalguss

I too found a use case for this. I need to have a blocking command on a score with Limit. So far I'm not finding any good solution.