Like KEYS, but only returns count. Hopefully would be much faster than KEYS (and would certainly require less data transfer).
Also, would be super nice if both this and KEYS were optimized to give faster response for prefixes.
Thanks!
Marc
Comment From: CloudMarc
I did some prototyping with sets, as suggested. I SET a key/value, set it to EXPIRE, then SADD it. This costs more time, which is acceptable, and more RAM, which isn't.
On the other side, I do SMEMBERS, do an MGET, SREM the expired items, and return a count. BTW, if I don't do this my set will grow without limits. The resulting performance on this side is considerably worse than counting the results of KEYS .
Counting KEYS - 33.5s SADD, MGET, cleanup - 49.8s
(this doesn't count the slowdown on the write side, but I'm using pipelining so I expect that to be minimal).
I made a fork and plan to do an NKEYS prototype and report timing for that. Expect this benchmark to be about 10s for this same data set.
Still think that NKEYS should be considered, and that for both KEYS and NKEYS, an algorithm which speeds up the case of searching on a prefix or suffix should be considered at some point in time.
Comment From: 0xfff
was NKEYS ever implemented? is there another way to elegantly count the number of keys?
Comment From: itamarhaber
No, it wasn't. DBSIZE as well as INFO's keyspace section give you a global key count.
For finer, e.g. by pattern, counts you can use a Lua script that calls SCAN for example.
Comment From: alexandernst
Is there any technical reason this was never implemented? Or is there just a lack of man/hours power?
Comment From: brianmaissy
In case anyone else reading this still wants a way to count keys matching a pattern, without using KEYS, and without wasting the bandwidth of passing all the keys back from the server, here is a recipe which runs SCAN from a lua script which returns only the number of matches in each chunk:
def count_keys(redis: Redis, pattern: str, chunk_size: int = 1000) -> int:
"""Counts the number of keys matching a pattern.
Doesn't use the redis KEYS command, which blocks and therefore can create availability
problems.
Instead of using SCAN and then counting the resulting keys, runs the SCAN via a lua script and
counts the keys on the redis side, just to save bandwidth.
We can't implement the entire scan loop in lua, because then it would block just as much as
KEYS, because the entire lua script runs atomically.
"""
scan_count = redis.register_script(
"""
local result = redis.call('SCAN', ARGV[1], 'MATCH', ARGV[2], 'COUNT', ARGV[3])
result[2] = #result[2]
return result
"""
)
cursor, count = scan_count(args=["0", pattern, chunk_size])
while cursor != "0":
cursor, count_delta = scan_count(args=[cursor, pattern, chunk_size])
count += count_delta
return count
Be careful about using too high a value for chunk_size, which will make this block just as much as KEYS.