Crash report
=== REDIS BUG REPORT START: Cut & paste starting from here ===
12758:M 27 Oct 2023 04:19:55.632 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.632 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.633 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.633 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.634 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.634 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.635 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.635 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.636 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.636 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.637 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.637 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.639 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.639 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.640 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.640 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.642 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.642 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
------ INFO OUTPUT ------
12758:M 27 Oct 2023 04:19:55.644 # === ASSERTION FAILED ===
12758:M 27 Oct 2023 04:19:55.644 # ==> cluster.c:5349 '(n->slot_info_pairs_count + 1) < (2 * n->numslots)' is not true
------ STACK TRACE ------
Backtrace:
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterGenNodesDescription+0x58)[0x4fcff8]
/usr/local/bin/redis-server *:6379 [cluster](logServerInfo+0x260)[0x4ea180]
/usr/local/bin/redis-server *:6379 [cluster](printCrashReport+0x18)[0x4ea718]
/usr/local/bin/redis-server *:6379 [cluster](_serverAssert+0x154)[0x4ea954]
/usr/local/bin/redis-server *:6379 [cluster](slowlogInit+0x0)[0x4fcf40]
/usr/local/bin/redis-server *:6379 [cluster](clusterReplyShards+0x40)[0x4fe880]
/usr/local/bin/redis-server *:6379 [cluster](call+0x14c)[0x46e3ac]
/usr/local/bin/redis-server *:6379 [cluster](processCommand+0x37c)[0x46ed3c]
/usr/local/bin/redis-server *:6379 [cluster](processInputBuffer+0xdc)[0x48dc3c]
/usr/local/bin/redis-server *:6379 [cluster](readQueryFromClient+0x2e8)[0x48e0e8]
/usr/local/bin/redis-server *:6379 [cluster][0x578388]
/usr/local/bin/redis-server *:6379 [cluster](aeMain+0x108)[0x46528c]
/usr/local/bin/redis-server *:6379 [cluster](main+0x3a0)[0x45a7c0]
/lib64/libc.so.6(+0x35a78)[0xffff97887a78]
/lib64/libc.so.6(__libc_start_main+0x9c)[0xffff97887b5c]
/usr/local/bin/redis-server *:6379 [cluster](_start+0x30)[0x45afb0]
...
Assertion failed error keeps going on for a long time
Additional information
- OS distribution and version Amazon Linux 2023, redis-version 7.2.2
- Steps to reproduce (if any)
- Running a rebalance command between a cluster with a mixture of 7.0.11 and new 7.2.2 nodes
redis-cli --cluster rebalance xxx:6379 --cluster-use-empty-masters --cluster-pipeline 1000 --cluster-weight d9a5864cf277e6f6cb21ea60a3cf0015ddf662a3=0
The rebalance commands get stuck and on investigation, this assertion was found.
Comment From: salarali
Any updates on this?
Comment From: enjoy-binbin
@salarali thanks for the report, i am taking a look. Although it's a bit tortuous, I found a way to reproduce it.
Comment From: enjoy-binbin
@PingXie since you are here, can you also take a look?
this somehow like #12805, if the node is a master, we may need to add it to the shard lins.
if (ext_shardid == NULL) clusterAddNodeToShard(sender->shard_id, sender);
the reason for the issue is, like if we have A (7.2) -> B (7.0), B is A master
in node A view, A does not know the B's shard id, so in here, we are not able to clear the B's slot_info.
void addShardReplyForClusterShards(client *c, list *nodes) {
...
addReplyBulkCString(c, "nodes");
addReplyArrayLen(c, listLength(nodes));
listIter li;
listRewind(nodes, &li);
for (listNode *ln = listNext(&li); ln != NULL; ln = listNext(&li)) {
clusterNode *n = listNodeValue(ln);
addNodeDetailsToShardReply(c, n);
clusterFreeNodesSlotsInfo(n);
}
But in here, we will keep increasing B's slot info, and eventually encountering the assert:
void clusterGenNodesSlotsInfo(int filter) {
...
/* Generate slots info when occur different node with start
* or end of slot. */
if (i == CLUSTER_SLOTS || n != server.cluster->slots[i]) {
if (!(n->flags & filter)) {
if (!n->slot_info_pairs) {
n->slot_info_pairs = zmalloc(2 * n->numslots * sizeof(uint16_t));
}
serverAssert((n->slot_info_pairs_count + 1) < (2 * n->numslots));
n->slot_info_pairs[n->slot_info_pairs_count++] = start;
n->slot_info_pairs[n->slot_info_pairs_count++] = i-1;
}
if (i == CLUSTER_SLOTS) break;
n = server.cluster->slots[i];
start = i;
}
}
Comment From: PingXie
@enjoy-binbin, it looks like your fix for #12805 might resolve this issue too. With that fix in place, every 7.2 node (like A in your example) should keep its shard view consistent. And if there's another replica in the mix, like A', in the same shard as A and B, it'll follow the same shard structure, though with a different ID. That's totally fine for a setup with different versions running. From what we talked about in #12805, once everyone's on 7.2, we should see the shard structures and IDs line up. That's when the shard IDs will really start to make sense.
Btw, is re-sharding required to trigger this bug? Generally, I'd lean towards updating all nodes to the same version before doing something as involved as re-sharding. Most folks update the whole cluster first, which is a good call – it keeps things straightforward and avoids the quirks you might run into with a mixed-version setup.
Comment From: enjoy-binbin
the fix in #12805 won't help, since we will first check whether sender's shard_id has changed, if it changed, we will add it to the shard list. and in this case, if the sender is a master, we won't change the shard_id, so we are not able to add it the the shard lins
Comment From: PingXie
in node A view, A does not know the B's shard id, so in here, we are not able to clear the B's slot_info.
When a 7.2 node A replicates from a 7.0 (primary) node B, A should inherit B's shard id, even it is randomly generated on node A. With the fix for #12805, I'd assume node B's shard ID remain stable on node A hence these two will remain in the same shard. So the statement above is not my understanding.
What are the repro steps? Or is it possible for you to share the core dump somehow? It is a bit hard to be certain just by looking at the source code.
Comment From: enjoy-binbin
my reproduce step:
step A:
7.0 cluster
./utils/create-cluster/create-cluster stop && ./utils/create-cluster/create-cluster clean
./utils/create-cluster/create-cluster start && ./utils/create-cluster/create-cluster create
step B:
7.2 node
rm -rf nodes.conf && ./src/redis-server redis.conf --cluster-enabled yes --port 7000
step C:
./src/redis-cli -p 30001 cluster meet 127.0.0.1 7000
./src/redis-cli --cluster rebalance 127.0.0.1:30001 --cluster-use-empty-masters --cluster-pipeline 1000 --cluster-weight d1a8056b89c9790a6ad836320e7c6f7dfc9fd282=0
step D:
./src/redis-cli -p 7000 cluster shards
I should be doing the C and D step repeatedly, and CLUSTER SHARDS will response with this (we can see the slots section keeps expanding):
1) 1) "slots"
2) 1) (integer) 0
2) (integer) 5461
3) (integer) 0
4) (integer) 5461
5) (integer) 0
6) (integer) 5461
7) (integer) 0
8) (integer) 5461
9) (integer) 0
10) (integer) 5461
11) (integer) 0
12) (integer) 5461
3) "nodes"
4) 1) 1) "id"
2) "a447baa52a5d5564fbf27974e93e4f0825746d00"
3) "port"
4) (integer) 30004
5) "ip"
6) "127.0.0.1"
7) "endpoint"
8) "127.0.0.1"
9) "role"
10) "replica"
11) "replication-offset"
12) (integer) 13566
13) "health"
14) "online"
The steps to reproduce are quite messy. I upgraded randomly locally.
yean, with the fix for #12805, there shard id will remain in the same shard. However, the shard id of a certain master has not been added to the shardi d list. Maybe there is something missing somewhere.
# the code in here will check memcmp, and since the sender' shard id is not changed,
# so we won't add it to the shard id list
static void updateShardId(clusterNode *node, const char *shard_id) {
if (shard_id && memcmp(node->shard_id, shard_id, CLUSTER_NAMELEN) != 0) {
clusterRemoveNodeFromShard(node);
memcpy(node->shard_id, shard_id, CLUSTER_NAMELEN);
clusterAddNodeToShard(shard_id, node);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG);
}
if (shard_id && myself != node && myself->slaveof == node) {
if (memcmp(myself->shard_id, shard_id, CLUSTER_NAMELEN) != 0) {
/* shard-id can diverge right after a rolling upgrade
* from pre-7.2 releases */
clusterRemoveNodeFromShard(myself);
memcpy(myself->shard_id, shard_id, CLUSTER_NAMELEN);
clusterAddNodeToShard(shard_id, myself);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_FSYNC_CONFIG);
}
}
}
That’s why I plan to add this:
if (ext_shardid == NULL and is_master) clusterAddNodeToShard(sender->shard_id, sender);
So the essential reason is that in the shard id list, that is, server.cluster->shards, #12805 will only add the replica, but not the master, causing this problem
Comment From: PingXie
Got it, this sounds more like a case where a 7.2 node is just observing a 7.0 shard, not actually replicating from it. Makes me wonder if we even need to go through re-sharding to reproduce this bug?
with the fix for https://github.com/redis/redis/pull/12805, there shard id will remain in the same shard. However, the shard id of a certain master has not been added to the shardi d list. Maybe there is something missing somewhere.
Did you get a chance to try this out with your latest changes in #12805? The earlier commits had this issue, but I thought your last update would've fixed it for both v7.0 primary and replica nodes.
Comment From: enjoy-binbin
Makes me wonder if we even need to go through re-sharding to reproduce this bug?
I feel it's not needed, i just follow the issue idea, and then i got an env that could be reproduced stably and didn't want to break it.
Did you get a chance to try this out with your latest changes in https://github.com/redis/redis/pull/12805? The earlier commits had this issue, but I thought your last update would've fixed it for both v7.0 primary and replica nodes.
i did try it (even earlier), it don't work. the new commit indeed will call updateShardId, but the memcmp check will prevent us add it to the shard id list.
Comment From: PingXie
You are right. Now I see two potential issues with updateShardId
- Guarding the shard dictionary update on the shard id change. These should've been two orthogonal decisions.
- The duplication of the shard dictionary update logic for both the incoming node
nandmyself
Do you like to propose a change?
Comment From: enjoy-binbin
i am happy to make the change and test it, but a little confused. Do you have any ideas, or can you elaborate a bit more?
Comment From: PingXie
Thinking about this more, a better and (more) correct fix would to update the shard topology when a 7.0 replica is connected to its 7.0 primary for the very first time. More specifically, we need to inject a updateShardId(sender, master->shard_id) call on cluster_legacy.c:2947. This should fix the issue.
Comment From: enjoy-binbin
we need to inject a updateShardId(sender, master->shard_id) call
I tried it, it didn't work.
Luckily I found the minimal steps to reproduce:
# 7.0 cluster
./utils/create-cluster/create-cluster stop && ./utils/create-cluster/create-cluster clean
./utils/create-cluster/create-cluster start && ./utils/create-cluster/create-cluster create
# 7.2 node
rm -rf nodes.conf && ./src/redis-server redis.conf --cluster-enabled yes --port 6379
# 7.2 node replicate with a 7.0 node
./src/redis-cli -p 30001 cluster meet 127.0.0.1 6379
./src/redis-cli -p 6379 cluster replicate d808332a59dc44d4cf8cd0f54c2ea18e34ded7fa
./src/redis-cli -p 6379 cluster shards && ./src/redis-cli -p 6379 cluster shards
so the reason is that a 7.2 node is a 7.0 node 's slave, the shard id dict does not have the 7.0 node, so when we issue cluster shards in 7.2 node, it went here: https://github.com/redis/redis/issues/12695#issuecomment-1827036077
Comment From: PingXie
so the reason is that a 7.2 node is a 7.0 node 's slave, the shard id dict does not have the 7.0 node, so when we issue cluster shards in 7.2 node, it went here: https://github.com/redis/redis/issues/12695#issuecomment-1827036077
This looks like a different issue than observed in your previous https://github.com/redis/redis/issues/12695#issuecomment-1827036077. I think we should still keep the change I proposed in https://github.com/redis/redis/issues/12695#issuecomment-1831207684.
We could continue with the fix proposed in https://github.com/redis/redis/issues/12695#issuecomment-1827240102 but an alternative could be fixing the underlying assumption of updateShardId, which is that the shard dict should be always in sync with the node's shard_id. In this sense, I wonder if we should consider a fix of calling clusterAddNodeToShard at https://github.com/redis/redis/blob/unstable/src/cluster_legacy.c#L2161 and https://github.com/redis/redis/blob/unstable/src/cluster_legacy.c#L1615, when the node in question is a primary and if its shard_id is not in the shard dict yet.
Comment From: enjoy-binbin
I wonder if we should consider a fix of calling clusterAddNodeToShard at https://github.com/redis/redis/blob/unstable/src/cluster_legacy.c#L2161 and https://github.com/redis/redis/blob/unstable/src/cluster_legacy.c#L1615, when the node in question is a primary and if its shard_id is not in the shard dict yet.
Wow, that's actually what I thought at first, shard dict should be always in sync with the node's shard_id. The first time I tried it, it didn't fix the issue, so I dropped it and considered fixing it with a smaller diff.
I actually agree with this idea. We didn’t add shard dict synchronously in some places, which feels like a hidden danger.
I will try to open a new PR later to add all these changes we mentioned (for better review)
Comment From: enjoy-binbin
@PingXie thanks! I verified that adding it to clusterRenameNode can fix this issue https://github.com/redis/redis/issues/12695#issuecomment-1831410106. please take a look with the fix #12832.