Hi, I have set up a Redis cluster with topology 3 masters - 6 slaves ( each master has 2 slaves ). Redis version is 6.2.2. In my configuration, I set 2 following configs:
min-slaves-to-write : 2
min-slaves-max-lag: 10
It means that there must be at least 2 slave nodes to keep the data synchronized with the master within 10 seconds, otherwise, the master will not accept write operations.
I make a test. I killed a slave node of a Redis master to its current connected slaves is less than min-slaves-to-write config.
X.X.X.X:7000> info replication
# Replication
role:master
connected_slaves:1
min_slaves_good_slaves:1
slave0:ip=X.X.X.X,port=7002,state=online,offset=4141971346,lag=0
master_failover_state:no-failover
master_replid:2006b4babb87cc2b41e7c96026a273da748bd88f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:4141971346
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:4140922771
repl_backlog_histlen:1048576
And I did a write operation to this master node. It returns an error and rejects the write operation.
127.0.0.1:7000> set P value
(error) NOREPLICAS Not enough good replicas to write
But when I use the EVAL command to evaluate Lua script for the same command, it writes to master and synced to remaining slave successfully.
X.X.X.X:7000> eval "return redis.call('set','P','value')" 0
OK
X.X.X.X:7000> get P
"value"
# data synced successfully to remaining slave node
X.X.X.X.7002> get P
"value"
So, that config min-slaves-to-write is not worked with Lua scripts? Or is it a bug in Redis? I searched but did not find any related issues.
Comment From: oranagra
This looks like a bug, maybe we can fix it together with #9928
Comment From: anhldbk
@oranagra
Observations
I tried to debug the above case while running the command
X.X.X.X:7000> eval "return redis.call('set','P','value')" 0
I found following things in our code base:
- The flow
┌────────┐ ┌──────┐ ┌──────┐ ┌──────────┐
│server.c│ │eval.c│ │lapi.c│ │t_string.c│
└───┬────┘ └──┬───┘ └──┬───┘ └────┬─────┘
│────┐ │ │
│ │ 𝟏 processCommand() │ │
│<───┘ │ │
│ │ │ │
│────┐ │ │ │
│ │ 𝟐 call() │ │ │
│<───┘ │ │ │
│ │ │ │
│ 𝟑 evalCommand() │ │ │
│ ──────────────────────> │ │
│ │ │ │
│ │────┐ │
│ │ │ 𝟒 luaCallFunction() │
│ │<───┘ │
│ │ │ │
│ │ 𝟓 lua_pcall() │ │
│ │ ───────────────────────> │
│ │ │ │
│ 𝟔 call() │ │
│ <─────────────────────────────────────────────── │
│ │ │ │
│ │ 𝟕 setCommand() │ │
│ ───────────────────────────────────────────────────────────────────>
│ │ │ │
│ │ 𝟖 │ │
│ <─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌───┴────┐ ┌──┴───┐ ┌──┴───┐ ┌────┴─────┐
│server.c│ │eval.c│ │lapi.c│ │t_string.c│
└────────┘ └──────┘ └──────┘ └──────────┘
- processCommand does not validate replication requirements for
evalcommands
// processCommand(), in server.c
if (server.masterhost == NULL &&
server.repl_min_slaves_to_write &&
server.repl_min_slaves_max_lag &&
is_write_command && // always False for eval commands
server.repl_good_slaves_count < server.repl_min_slaves_to_write)
{
rejectCommand(c, shared.noreplicaserr);
return C_OK;
}
This results in the requirements are not validated.
- setCommand does not also
Questions
- Can you explain how issue https://github.com/redis/redis/pull/9928 relates to this one?
- Is it good to have replication checks while handling
server.c#call()?
Comment From: oranagra
i think it's as simple as just mirroring that check to scriptVerifyWriteCommandAllow.
the reason it's related to the other PR i referred to is that both are about mirroring checks that are done in processCommand, to places that use call() directly (i.e. scriptCall and execCommand).
ideally some of the checks in processCommand will be extracted to some common function that's used by both.
Comment From: anhldbk
@oranagra Thanks for your clarification. Let's my friend @NoRaDoMi make a PR for this bug.
Comment From: noradomi
@oranagra @anhldbk I created a PR for this bug at #10160. PTAL