Related issue: #7153
The problem/use-case that the feature addresses
Given an RDB produced at a standalone instance, if we reload the RDB on oss cluster primaries we get orphaned keys. Should we allow for an easy way of doing DEBUG RELOAD NOSAVE while only keeping the relevant slots for that node ( even for testing it eases the process )?
To reproduce Using standalone instance fill it with 1M keys. Assuming you have a standard dump.rdb on the project root folder:
memtier_benchmark --key-pattern=P:P --key-maximum 999999 -n allkeys --ratio=1:0
cd utils/create-cluster
./create-cluster start
./create-cluster create
for x in `seq 1 6`; do cp ../../dump.rdb dump-3000$x.rdb ; done
# force to reload the rdb on each shard
./create-cluster call debug reload nosave
We now have each shard with 1M keys:
./create-cluster call info keyspace
# Keyspace
db0:keys=1000000,expires=0,avg_ttl=0
# Keyspace
db0:keys=1000000,expires=0,avg_ttl=0
# Keyspace
db0:keys=1000000,expires=0,avg_ttl=0
# Keyspace
db0:keys=1000000,expires=0,avg_ttl=0
# Keyspace
db0:keys=1000000,expires=0,avg_ttl=0
# Keyspace
db0:keys=1000000,expires=0,avg_ttl=0
Given the above keyspace info we have multiple "owners" for the same key:
redis-cli --cluster check 127.0.0.1:30001 --cluster-search-multiple-owners
127.0.0.1:30001 (fc4cbf8b...) -> 1000001 keys | 5461 slots | 1 slaves.
127.0.0.1:30003 (7b3823ad...) -> 1000001 keys | 5461 slots | 1 slaves.
127.0.0.1:30002 (52155b20...) -> 1000001 keys | 5462 slots | 1 slaves.
[OK] 3000003 keys in 3 masters.
183.11 keys per slot on average.
>>> Performing Cluster Check (using node 127.0.0.1:30001)
M: fc4cbf8b8e93b10817ece6c8edca1b939c3ae358 127.0.0.1:30001
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: 7b3823ad330242bf64bb009d0a8268e60c41a9ce 127.0.0.1:30003
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 1ec7b684f54c9ab030e5a76da574c883f0b04d68 127.0.0.1:30006
slots: (0 slots) slave
replicates 7b3823ad330242bf64bb009d0a8268e60c41a9ce
S: 83b9d3b734b2ee8552f9657e9e932dd2096d91ce 127.0.0.1:30004
slots: (0 slots) slave
replicates fc4cbf8b8e93b10817ece6c8edca1b939c3ae358
S: 41d710abe889a20dc8355d2b5a0e5f12fef12569 127.0.0.1:30005
slots: (0 slots) slave
replicates 52155b20230dd94346367436c9d6ed1a515686fc
M: 52155b20230dd94346367436c9d6ed1a515686fc 127.0.0.1:30002
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Check for multiple slot owners...
[WARNING] Slot 0 has 3 owners:
127.0.0.1:30001
127.0.0.1:30003
127.0.0.1:30002
[WARNING] Slot 1 has 3 owners:
127.0.0.1:30001
127.0.0.1:30003
127.0.0.1:30002
[WARNING] Slot 2 has 3 owners:
127.0.0.1:30001
127.0.0.1:30003
127.0.0.1:30002
[WARNING] Slot 3 has 3 owners:
127.0.0.1:30001
127.0.0.1:30003
127.0.0.1:30002
(...)
Expected behavior
Allow to easily specify that on reload we only want to retain the keys that belong to the given node and the all the others will be deleted.
Comment From: madolson
In AWS we have functionality so that if 3 shards from the same cluster attempt to load a snapshot, they will discard keys that they do not own. We use this for restoring a cluster mode disabled snapshot into a cluster mode enabled cluster.
Is the intention here that we would build this as a feature, or just for testing?
Comment From: filipecosta90
Is the intention here that we would build this as a feature, or just for testing?
@madolson if users see value in it I would vouch for a feature request. I see tremendous value in this.
Comment From: filipecosta90
ccing @TBone542 and @shaharmor given they also shared interest in this in the past on https://github.com/redis/redis/issues/7153
Comment From: hpatro
IIUC, There are two problems here.
1) RDB generated from a cluster mode disabled (CMD) and loaded onto a cluster mode enabled (CME) setup. Possible solution: Discard the keys while loading. 2) Incorrect slot migration leaving behind orphaned keys. Possible solution: Build a command to bypass the cluster protocol and cleanup the orphaned keys.
@madolson @filipecosta90 Should we address both of them ?
Comment From: filipecosta90
IIUC, There are two problems here.
1. RDB generated from a cluster mode disabled (CMD) and loaded onto a cluster mode enabled (CME) setup. _Possible solution: Discard the keys while loading._ 2. Incorrect slot migration leaving behind orphaned keys. _Possible solution: Build a command to bypass the cluster protocol and cleanup the orphaned keys._@madolson @filipecosta90 Should we address both of them ?
@hpatro @madolson I believe if we can have visibility for this on an info exposed metric ( like orphaned keys count ) and then have the command to explicitly allow for cleaning them would be the safest solution. WDYT?
Comment From: hpatro
I also agree with your suggested solution. This would also help the users who are already facing the issue of orphaned keys.
Comment From: filipecosta90
@redis/core-team wdyt? can this be classified as state:help-wanted or is this not consensual as a feature that would be of interest?
Comment From: oranagra
Maybe i'm missing something, but i wanna state that DEBUG RELOAD is not a feature in redis, it's a hack for testing. Redis cannot load rdb files at runtime (only at startup or in a replica), and IIUC that's a design decision, not a mistake.
There are 3rd party tools that allow importing data into redis by parsing an RDB file and sending RESTORE commands. maybe redis-cli can do that some day too (maybe when we make rdb.c into librdb.so and expose some parsing interface)
so to circle back at the question in the top, is it just about DEBUG RELOAD? is it intended just for testing and hacking? or are we looking for such a feature for when redis reads an RDB file at startup?
Comment From: hpatro
Sadly right now I'm unable to reproduce the exact scenario how I managed to generate orphaned keys. I was playing around with slot migration for PUBSUB V2 feature and observed the behaviour of keys being left in the old master. I will get back if I end up with orphaned keys again.
Comment From: yossigo
@oranagra Do you remember any arguments against loading RDB files at runtime as a feature? I see the value in supporting RDB migrating into a cluster, but if we do that it should be a first class feature and not something on top of DEBUG RELOAD.
Comment From: oranagra
@yossigo i don't recall. it could possibly cause some complications around AOF and replication. and maybe also some issues for modules, or clients getting an unexpected LOADING error.
Comment From: yossigo
@oranagra It's practically not very different than a replica that's receiving a full RDB, so I actually can't come up with a good reason not to support it.
Comment From: oranagra
it's a little different than a replica, since it does need to generate a new replication id. i'd argue that it's no different than a server re-start, but one difference is that there may already be blocked clients which the loading should awake, so in that sense it's more similar to a replica. p.s. if we add that, we'll probably not want to support async-loading via swapdb though 8-).
well, i can't think of any reason why it was never supported, but i have a feeling there's a reason. and i'd still argue that some clients can get surprised by the unexpected LOADING error, and that modules can maybe have serious issues (i.e. both clients and modules are something we can't predict).