I've written a small Java program to copy and transform keys and their values from one redis cluster to another. I have to process tens of millions of keys. To do the job properly I need to send a round-trip request to get the type of the keys returned from SCAN which causes time and network traffic to add up quickly with millions of keys.
I'd like to suggest a modification to the SCAN command to allow for a filter on type or other attributes so I can batch the copy by type and save the round trip. Something like:
scan 0 match * count 100 type hash
or
scan 0 match * count 100 filter type=hash
which will filter the scan results to only return keys that represent a hash type. This avoids the request for the type. I can then parallelize the copy process to do hash, set, zset, etc. as separate tasks and avoid the overhead of the type call.
I can imagine many kinds of filters, such as keys that will expire in 1 hour, sets that contain more than nnn elements, lists with exactly one element, etc.
Perhaps filters can be combined : filter type=list AND ttl<3600
Comment From: itamarhaber
This has been partially addressed (support for TYPE) by #6116
Comment From: itamarhaber
Closing - feel free to reopen or create a new issue (for extended filtering)