If a slave is using AOF-RDB mix mode, when rewrite aof file, the slave don't save the repl-offset and repl-id in RDB format code is here the function rdbSaveRio, the last parameter is NULL, means don't save the repl-offset and repl-id

aof.c

int rewriteAppendOnlyFile(char *filename) {

    if (server.aof_use_rdb_preamble) {
        int error;
        if (rdbSaveRio(&aof,&error,RDB_SAVE_AOF_PREAMBLE,NULL) == C_ERR) {
            errno = error;
            goto werr;
        }
    }

}

There is a situation, when the slave crashes (not normal shutdown) and then restart, it need to do FULLSYNC with master, not partially resynchronize In my opinion, if we save the repl-id and repl-offset in aof-rewrite file, when the slave restart, it can load only the RDB part of the aof-rewrite file if the repl-id and repl-offset in RDB part is not NULL so that the slave is possible to do partially resynchronize with master

I have read the article: https://gist.github.com/antirez/ae068f95c0d084891305 but still don't understand

If there has been any discussion before, please give me the link, Thanks.

Comment From: yossigo

This has been addressed in the past in the context of additional replication improvements, but it I don't see why it can't be fixed locally regardless of the bigger picture. Maybe as part of that fix we should also address other RDB saving inconsistencies like scripts that may or may not be persisted. @oranagra WDUT?

Comment From: oranagra

So you propose that when redis starts from AOF, if it sees a repl_id and offset it stops after the RDB portion, avoids reading the rest of the AOF, and attempts a psync? if it fails and ends up performing a full sync, that's not too bad (loading the RDB was a waste of time in this case, so at least we didn't read the AOF part too), but what if the master is dead and the replica is promoted? in that case we'd prefer to read the rest of the AOF file?

Comment From: yossigo

Theoretically we could read the entire AOF, update the offset and psync from there. Or try to be more efficient and just probe the repl_id and fail early. But yes, that's already driving us back to handle the bigger issues.

Comment From: oranagra

folding this ticket into #9796