Describe the bug

we received an update message which removed ownership for certain slots we still have keys about, we need to remove all the keys from the slots we lost, but these delete keys operation not replicate to replica

To reproduce https://github.com/redis/redis/blob/unstable/src/cluster.c#L1953 https://github.com/redis/redis/blob/unstable/src/cluster.c#L6981

/* Remove all the keys in the specified hash slot.
 * The number of removed items is returned. */
unsigned int delKeysInSlot(unsigned int hashslot) {
    unsigned int j = 0;
    dictEntry *de = (*server.db->slots_to_keys).by_slot[hashslot].head;
    while (de != NULL) {
        sds sdskey = dictGetKey(de);
        de = dictEntryNextInSlot(de);
        robj *key = createStringObject(sdskey, sdslen(sdskey));
        dbDelete(&server.db[0], key);
        decrRefCount(key);
        j++;
    }
    return j;
}

after exec 'dbDelete(&server.db[0], key)', I think we should replicate 'del key' to replica, am i right? @oranagra @judeng

Expected behavior

/* Remove all the keys in the specified hash slot.
 * The number of removed items is returned. */
unsigned int delKeysInSlot(unsigned int hashslot) {
    unsigned int j = 0;
    dictEntry *de = (*server.db->slots_to_keys).by_slot[hashslot].head;
    while (de != NULL) {
        sds sdskey = dictGetKey(de);
        de = dictEntryNextInSlot(de);
        robj *key = createStringObject(sdskey, sdslen(sdskey));
        dbDelete(&server.db[0], key);

        /******
        propagateDeletion(&server.db[0], key, server.lazyfree_lazy_server_del);
        ******/

        decrRefCount(key);
        j++;
    }
    return j;
}

Comment From: oranagra

@weim0000 have you really reproduced this? can you share the reproduction scenario? or is it just a theoretical issue from reviewing the code. @madolson please take a look.

Comment From: weim0000

@oranagra thanks , I reproduced this:

I create a test cluster: 127.0.0.1:6001(master)<-127.0.0.1:6004(replica) 127.0.0.1:6002(master)<-127.0.0.1:6005(replica) 127.0.0.1:6003(master)<-127.0.0.1:6006(replica)

slot 3498 assinged to 127.0.0.1:6001: 127.0.0.1:6001> cluster countkeysinslot 3498 (integer) 7

then I force assinged 3498 to 127.0.0.1:6002: 127.0.0.1:6002> cluster setslot 3498 node 579ae52c8ca288cdd8545bb57c7ace0e97e8f4f5 OK 127.0.0.1:6002> cluster countkeysinslot 3498 (integer) 0

then 127.0.0.1:6001 loss slot 3498, and delete all keys in slot(3498): 127.0.0.1:6001> cluster countkeysinslot 3498 (integer) 0

then I access 127.0.0.1:6004, It's a replica of 127.0.0.1:6001, but 127.0.0.1:6004 always have some keys in slot 3498: 127.0.0.1:6004> cluster countkeysinslot 3498 (integer) 7

so I think 127.0.0.1:6001 not replicate some delete keys operation to replica(127.0.0.1:6004)

Comment From: madolson

I'm not sure that the correct behavior is to replicate the deletes, but the replicas should be purging the keys from slots that are no longer owned from their masters. We should probably address some of this here: https://github.com/redis/redis/pull/10517.