The problem/use-case that the feature addresses

I want to integration Redis with ClickHouse https://github.com/ClickHouse/ClickHouse/pull/50150. While command scan(other scans) can produce duplicated keys. So I need to filter duplicated keys and I need know whether there maybe some duplication.

I plan to determine it by : 1. keep previous scan resoult. 2. keep previous scan stat (whether Redis is rehashing) 3. get current scan stat 4. if previous stat is false and current stat is true, filter previous scan resoult and current scan resoult.

I tried to find how to determine Redis scan stat, but did not find it(Please correct me if I was wrong), so I think we should add it.

Descripe the feature

Add a section to info.

info dictionary

And we will get

rehashidx: -1            // -1: not rehashing, 0-n: db is in rehashing, and result is bucket index, 
scaling: 0                 // 0 for scaling and 1 for shrinking
ht_size_mask_0:7  // hash table 0 size mask
ht_size_mask_1:7   // hash table 1 size mask 
ht_used_0:              // hash table 0 used bucket size
ht_used_1:              // hash table 1 used bucket size

Alternative way

Add a flag to scan result if the next scan may contain duplicated keys and application will take the flag directly.

new scan result may like

redis 127.0.0.1:6379> scan 0
1) "17"
2)  1) "key:12"
    2) "key:8"
    3) "key:4"
    4) "key:14"
3) "0"

This is a more strict way for the case.

Comment From: JackyWoo

If the idea is ok, I'd like to submit a PR.

Comment From: yossigo

@JackyWoo We should not expose or rely on such implementation details. Can you describe what you use the key names received by SCAN for and why duplicates are a problem?

Comment From: JackyWoo

@yossigo Thanks for your reply. I try to integration Redis with ClickHouse and Redis is treated as a backend of ClickHouse. When I send a query like

select * from table_redis

I will get duplicated data.

Comment From: yossigo

I don't have the context, but I imagine that even without duplicate keys a simple SCAN might not be what you're looking for. For example, during the scanning keys could be created and deleted, which means the resulting dataset will not represent a valid point in time.

Comment From: JackyWoo

@yossigo Thanks for your advice.

ClickHouse is a OLAP database. I just try to take Redis as a storage of it. As a database it is a common case that scan whole dataset. As to Redis there are 2 fashions:

  1. keys *
  2. scan

For keys * is too heavy that I choose scan.

Actually multi scans are not atomic and can not represent a valid point in time. But users may concern duplicated result much more. So I rise the issue.

I hope I have made the context clear.