I am useing redis 4.0.4 with cluster mode, one of the master is down yesterday, so the slave upgrade to master, when i want to add a new slave for this master, it`s crashed.
**Crash report**
=== REDIS BUG REPORT START: Cut & paste starting from here ===
2220:S 15 Sep 02:39:37.714 # === ASSERTION FAILED OBJECT CONTEXT ===
2220:S 15 Sep 02:39:37.714 # Object type: 0
2220:S 15 Sep 02:39:37.714 # Object encoding: 0
2220:S 15 Sep 02:39:37.714 # Object refcount: 1
2220:S 15 Sep 02:39:37.714 # Object raw string len: 28
2220:S 15 Sep 02:39:37.714 # Object raw string content: "s:u:r:a:b:k:8976737_11794735"
2220:S 15 Sep 02:39:37.714 # === ASSERTION FAILED ===
2220:S 15 Sep 02:39:37.714 # ==> db.c:171 'retval == DICT_OK' is not true
2220:S 15 Sep 02:39:37.714 # (forcing SIGSEGV to print the bug report.)
2220:S 15 Sep 02:39:37.714 # Redis 4.0.14 crashed by signal: 11
2220:S 15 Sep 02:39:37.714 # Crashed running the instruction at: 0x466b0a
2220:S 15 Sep 02:39:37.714 # Accessing address: 0xffffffffffffffff
2220:S 15 Sep 02:39:37.714 # Failed assertion: retval == DICT_OK (db.c:171)
------ STACK TRACE ------
EIP:
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](_serverAssert+0x6a)[0x466b0a]
Backtrace:
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](logStackTrace+0x29)[0x468a39]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](sigsegvHandler+0xac)[0x4690dc]
/lib64/libpthread.so.0(+0xf630)[0x7f3da2939630]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](_serverAssert+0x6a)[0x466b0a]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](dbAdd+0x87)[0x440d97]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](rdbLoadRio+0x1c8)[0x44b118]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](rdbLoad+0x3b)[0x44b74b]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](readSyncBulkPayload+0x2b7)[0x443b67]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](aeProcessEvents+0x2a0)[0x4267a0]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](aeMain+0x2b)[0x426a6b]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster](main+0x49f)[0x42386f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f3da257e555]
/usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster][0x423b62]
------ INFO OUTPUT ------
# Server
redis_version:4.0.14
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:1cfc14537bf00bcf
redis_mode:cluster
os:Linux 3.10.0-1160.76.1.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:4.8.5
process_id:2220
run_id:275401344e3af8464e47239fb583a74a0003d658
tcp_port:7001
uptime_in_seconds:1186
uptime_in_days:0
hz:10
lru_clock:2282507
executable:/usr/local/redis-cluster/src/redis-server
config_file:/usr/local/redis-cluster/7001/redis.conf
# Clients
connected_clients:2
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:44134283536
used_memory_human:41.10G
used_memory_rss:4411392
used_memory_rss_human:4.21M
used_memory_peak:44134283536
used_memory_peak_human:41.10G
used_memory_peak_perc:100.01%
used_memory_overhead:20816645332
used_memory_startup:3866824
used_memory_dataset:23317638204
used_memory_dataset_perc:52.84%
total_system_memory:134908211200
total_system_memory_human:125.64G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:103079215104
maxmemory_human:96.00G
maxmemory_policy:volatile-ttl
mem_fragmentation_ratio:0.00
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
# Persistence
loading:1
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1663226391
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
loading_start_time:1663226891
loading_total_bytes:11150084948
loading_loaded_bytes:11012145151
loading_loaded_perc:98.76
loading_eta_seconds:8
# Stats
total_connections_received:757
total_commands_processed:1483
instantaneous_ops_per_sec:0
total_net_input_bytes:11150178591
total_net_output_bytes:463805
instantaneous_input_kbps:111150.88
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
# Replication
role:slave
master_host:172.17.0.182
master_port:7002
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:1
slave_repl_offset:1
master_sync_left_bytes:0
master_sync_last_io_seconds_ago:686
master_link_down_since_seconds:1663227577
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:dd2c0b9a66a09df8c74ebb0854c43313e289ad33
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:209715200
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:42.24
used_cpu_user:678.73
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
# Commandstats
cmdstat_client:calls=320,usec=642,usec_per_call=2.01
cmdstat_cluster:calls=320,usec=139924,usec_per_call=437.26
cmdstat_info:calls=843,usec=10596,usec_per_call=12.57
# Cluster
cluster_enabled:1
# Keyspace
db0:keys=258097577,expires=258075369,avg_ttl=0
------ CLIENT LIST OUTPUT ------
id=16 addr=172.17.0.203:43064 fd=21 name= age=1170 idle=41 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=info
id=18 addr=172.17.0.203:43080 fd=23 name= age=1170 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=info
------ REGISTERS ------
2220:S 15 Sep 02:39:37.715 #
RAX:0000000000000000 RBX:00000000000000ab
RCX:00000000009d8260 RDX:0000000000012b60
RDI:00007f3da2923760 RSI:0000000000000000
RBP:00000000004fc52e RSP:00007ffd3aa949f0
R8 :0000000000000001 R9 :00007f3da29237b8
R10:00007f3da29237b8 R11:0000000000000206
R12:00000000004f9cee R13:0000000000000000
R14:0000000000000000 R15:00007ffd3aa94eb0
RIP:0000000000466b0a EFL:0000000000010202
CSGSFS:0000000000000033
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949ff) -> 0000018509907300
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949fe) -> 0000000000000000
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949fd) -> 0000000000000008
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949fc) -> 0000000000000013
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949fb) -> 0000000000000000
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949fa) -> 00007ffd3aa94f40
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f9) -> 00000183400c4e6b
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f8) -> 0000000000000000
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f7) -> 000000000044b118
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f6) -> 00007f3d9ba48000
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f5) -> 00007f331b091cf0
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f4) -> 00000185099073a5
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f3) -> 0000000000440d97
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f2) -> 00007f3d9ba48000
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f1) -> 00007f331b091cf0
2220:S 15 Sep 02:39:37.715 # (00007ffd3aa949f0) -> 00007f33152a48c0
------ FAST MEMORY TEST ------
2220:S 15 Sep 02:39:37.715 # Bio thread for job type #0 terminated
2220:S 15 Sep 02:39:37.715 # Bio thread for job type #1 terminated
2220:S 15 Sep 02:39:37.715 # Bio thread for job type #2 terminated
*** Preparing to test memory region 745000 (98304 bytes)
*** Preparing to test memory region 9ca000 (135168 bytes)
*** Preparing to test memory region 7f3315200000 (45170556928 bytes)
*** Preparing to test memory region 7f3d999fe000 (8388608 bytes)
*** Preparing to test memory region 7f3d9a1ff000 (8388608 bytes)
*** Preparing to test memory region 7f3d9aa00000 (16777216 bytes)
*** Preparing to test memory region 7f3d9ba00000 (2097152 bytes)
*** Preparing to test memory region 7f3da2200000 (2097152 bytes)
*** Preparing to test memory region 7f3da2925000 (20480 bytes)
*** Preparing to test memory region 7f3da2b42000 (16384 bytes)
*** Preparing to test memory region 7f3da3262000 (16384 bytes)
*** Preparing to test memory region 7f3da326a000 (8192 bytes)
*** Preparing to test memory region 7f3da326c000 (4096 bytes)
*** Preparing to test memory region 7f3da326f000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: _serverAssert (base: 0x466aa0)
Module: /usr/local/redis-cluster/src/redis-server 172.17.0.183:7001 [cluster] (base 0x400000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x466aa0 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
2220:S 15 Sep 02:41:26.388 # dump of function (hexdump of 234 bytes):
41548b054c0a2e004989fc554889f585c05389d37505e8f5fdffffbebe155000bf0300000031c0e8e432fcff4d89e089d94889eabed7155000bf0300000031c0e8cb32fcffbe50065000bf0300000031c04c8925e8092e0048892de9092e00891deb092e00e8a632fcffc60425ffffffff785b5d415cc3660f1f84000000000041544989fc55534883c4808b15c3092e0085d20f84e700000031c0be80065000bf03000000e86632fcff418b94249800000031c0bef2155000bf0300000031ed31dbe84932fcff418b54240831c0be05165000bf03000000e83332fcff418b54243831c0be15165000bf
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
Please report the crash by opening an issue on github:
http://github.com/antirez/redis/issues
Suspect RAM error? Use redis-server --test-memory to verify it.
Additional information
- OS is centos 7.9 64 bit
- Steps to reproduce (if any)
Comment From: oranagra
looks like somehow, the key s:u:r:a:b:k:8976737_11794735 exists twice in the rdb.
i don't know how it's possible other than some memory corruption on the master.
if you happen to be able to reproduce it from scratch (on a clean deployment), we can debug it, otherwise, i would suggest upgrading to a more recent version of redis, hoping that whatever caused it is solved (the version you're using is very old)