I set up a redis sentinel,but it stopped abnormally,the report log is blow redisbug3.txt redisbug2.txt redisbug1.txt

Comment From: hwware

This doesn't looks like a Sentinel crash, pinging @oranagra , did we see any crash before related to ziplistCascadeUpdate? thanks

Comment From: oranagra

16111:S 14 May 2021 05:06:05.797 # Redis 5.0.8 crashed by signal: 11
16111:S 14 May 2021 05:06:05.797 # Crashed running the instruction at: 0x7f6904cdab77
16111:S 14 May 2021 05:06:05.797 # Accessing address: 0x7f67139ffffc
16111:S 14 May 2021 05:06:05.797 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/lib64/libc.so.6(+0x15bb77)[0x7f6904cdab77]

Backtrace:
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(logStackTrace+0x29)[0x471639]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(sigsegvHandler+0xac)[0x471cdc]
/lib64/libpthread.so.0(+0xf630)[0x7f6904f5c630]
/lib64/libc.so.6(+0x15bb77)[0x7f6904cdab77]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(__ziplistCascadeUpdate+0xf6)[0x43b706]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(__ziplistInsert+0x2db)[0x43bbfb]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(quicklistPushTail+0x39)[0x429279]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(listTypePush+0x56)[0x4543d6]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(pushGenericCommand+0x74)[0x4549a4]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(call+0x9b)[0x430c4b]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(processCommand+0x33f)[0x431eef]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(processInputBuffer+0x175)[0x440bf5]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(processInputBufferAndReplicate+0x1e)[0x440d2e]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(aeProcessEvents+0x2a0)[0x42b070]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(aeMain+0x2b)[0x42b33b]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(main+0x4c0)[0x4281e0]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6904ba1555]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379[0x42843a]

i remember we recently optimized __ziplistCascadeUpdate in #6886, but i don't recall we fixed any bug in there. maybe @sundb or @neal-zhu remember something.

we did have some reports in the past about ziplists getting corrupted, but i think we were never able to find any bug and fix it. Salvatore always suspected HW issues.

@liangerll113 can you check if the hosts you're using have ECC memory chips? these failures happen on different machines, but on the same dataset, right? so in theory the master could have corrupted the data, and sent it to the replicas so later they all crash.

Comment From: sundb

/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(logStackTrace+0x29)[0x471639]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(sigsegvHandler+0xac)[0x471cdc]
/lib64/libpthread.so.0(+0xf630)[0x7feda496a630]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(_serverPanic+0x117)[0x46f967]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(zipIntSize+0x81)[0x43b0b1]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(zipEntry+0xe1)[0x43b531]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(ziplistGet+0x38)[0x43c028]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(quicklistNext+0x184)[0x429f74]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(lrangeCommand+0x133)[0x4552d3]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(call+0x9b)[0x430c4b]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(processCommand+0x33f)[0x431eef]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(processInputBuffer+0x175)[0x440bf5]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(aeProcessEvents+0x2a0)[0x42b070]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(aeMain+0x2b)[0x42b33b]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379(main+0x4c0)[0x4281e0]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7feda45af555]
/app/redis/redis-5.0.8/src/redis-server 0.0.0.0:6379[0x42843a]

This is another stack. Probably the crash was not due to __ziplistCascadeUpdate, the entire ziplist data was corrupted. @liangerll113 Can you back up the rdb file, then execute lrange card_userCardList_ssdp_SHENJIAN8 0 -1 again?

Comment From: sundb

hi @liangerll113, What are the config list-max-ziplist-size and list-compress-depth? How many elements are stored in the list in question?

Comment From: liangerll113

Thank all of you very much, I think the reason for this anomaly is that I copyed the rdb file from redis-3.x to redis-5.x, maybe the rdb file is broken or the rdb file is incompatible from redis-3.x to redis-5.x, I guess. I delete the rdb file and restart the redis server, then redis works normally until now.

Comment From: oranagra

rdb files are compatible. i.e. new versions are always able to load rdb files from old versions. maybe it did get corrupted in some way, or maybe the data was somehow corrupted in RAM in the redis that generated it, in a way that caused the new version to crash. note that it didn't crash while loading the rdb file, it crashed while executing a command (LPUSH and LRANGE).