The problem/use-case that the feature addresses

In some use cases, the cardinality of the set intersection is required up until a specified limit after which the cardinality is just "too big". For instance, if each set contains users that visited a corresponding webpage, then we might want to ask whether at least N users visited pages A, B and C. Computing such query is costly if the actual intersection cardinality is large. Using a limit of N speeds up the computation.

To address these use cases, SQL offers LIMIT. The command SINTERCARD, however, does not support such functionality and the cardinality of the intersection is computed entirely.

Description of the feature

Enhance the command SINTERCARD such that it supports limit. This can be done in multiple ways:

  1. SINTERCARD key [key ...] [LIMIT limit]
  2. SINTERCARD [LIMIT limit] key [key ...]
  3. SINTERCARDLIMIT limit key [key ...]

If the intersection cardinality reaches limit partway through the computation, the algorithm exits and yields limit as the cardinality. Such implementation ensures a significant speedup for queries where the limit is lower than the actual intersection cardinality.

Alternatives you've considered

Possibly, such algorithm can be written in lua and used as a lua-redis script, however, the speed up will be lost. Other alternatives such as using different commands are not known.

Comment From: oranagra

@redis/core-team we may need a quick decision here since the current (yet to be released) syntax of SINTERCARD isn't extensible (we won't be able to add flags after that's realeased). I suppose we do need to change it to look like ZINTERCARD (which will make it different than SINTER).

@wattik i don't yet understand how this optimization is possible, in order to know the cardinality of the intersection we must scan all the inputs (i.e. imagine the last input key has only one member that doesn't exist in all the others). what we could possibly do is the other way around, i.e. stop the search as soon as the result cardinality is below some threshold.

Comment From: wattik

@oranagra As far as I recall, the set intersection algorithm in redis identifies the smallest set first and then iterates it. This iteration could be exited once the intermediate count of intersecting elements reaches limit. This would ensure that the actual cardinality is at least limit.

What you suggest is however also very interesting. In other words, this would provide that the actual cardinality is at most limit. Note, however, what I am proposing is different as it uses "at least".

Comment From: oranagra

@wattik imagine this case. set1: [a, b, c] set2: [a. b. d] set3: [e, f, g, h, i]

imagine you do SINTERCARD with limit of 2, can you exit after scanning set2? only when going though set3 you'll find that the intersection is empty.

maybe i'm missing something, but as far i can tell, since we are only constantly reducing the resulting set, we can only provide stop the search when we know the intersection is certainly gonna be at most limit, not at least limit.

Comment From: wattik

@oranagra Please correct me if I am wrong but the current algorithm works differently. The algorithm first selects the smallest set. Then this set is iterated. Inside the loop, if all remaining sets contain the current element, the intermediate cardinality counter is increased. If we compared the intermediate cardinality against limit, then we could break the loop once the limit has been reached. The check could occur right after the cardinality counter has been increased - somewhere in this place. Note the cardinality counter always provides a lower bound on the actual cardinality, hence after breaking the loop when reaching limit, the returned value would signal that the actual cardinality is at least limit.

Comment From: oranagra

@wattik forgive me... you're right. For some reason I remembered a different algorithm, and didn't bother to look at the code.

So do you think this LIMIT argument can also be useful for SINTER and SINTERSTORE, and also for ZINTER*?

Comment From: wattik

@oranagra It sounds reasonable. For example, SINTER with LIMIT could be used to peak into what elements are in the intersection without iterating the actual possibly large intersection. So if it helps implementation-wise, then why not. If it complicates things then I think implementing LIMIT into SINTERCARD only makes much more sense as it addresses a unique functionality (cardinality lower bound) that has multiple use cases in real world.

Comment From: itamarhaber

I suppose we do need to change it to look like ZINTERCARD (which will make it different than SINTER).

Makes sense, especially if we decide that LIMIT is applicable to zsets as well.

Comment From: enjoy-binbin

I suppose we do need to change it to look like ZINTERCARD (which will make it different than SINTER).

I am thinking of implementing it, so the new syntax will look like: sintercard numkeys [ ...] [LIMIT limit] sintercard 2 key1 key2 LIMIT 2

Comment From: oranagra

yes, but maybe we wanna wait to see if anyone else from the core team wants to argue against. also, maybe @wattik wanted to implement it?

Comment From: wattik

@oranagra I am not familiar with the codebase enough to implement it, unfortunately.

Comment From: madolson

We are great at being inconsistent about naming :D I'd rather new commands follow a better syntax than be consistent with similar commands.

I like having both of the commands look like <command> <numkeys> key1 [key2 ...] [LIMIT 10]

Comment From: oranagra

@enjoy-binbin seems that we have a quorum, go ahead 8-)