Hi, Till now, I've been running Redis 3.2.6 from Debian 9 (AMD64) with no problems. Some time ago, I upgraded to Debian 10 (Redis 5.0.3) and it began to crash with this message: Redis 5.0.3 crashed by signal: 11

So now I've tried upgrading to Debian 11 (testing) that comes with Redis 6.0.9 giving the same result than Redis 5.0.3.

I do not suspect from RAM problems since I've run memtest86+ several hours with no errors and I can reproduce the signal 11 crash in two different computers. Furthermore, Redis has always worked fine with version 3.2.6.

The database is managed by Nodebb.

This is the report:

=== REDIS BUG REPORT START: Cut & paste starting from here === 1660:M 05 Feb 2021 22:33:44.283 # Redis 6.0.9 crashed by signal: 11, si_code: 1 1660:M 05 Feb 2021 22:33:44.283 # Crashed running the instruction at: 0x559cbde51e40 1660:M 05 Feb 2021 22:33:44.283 # Accessing address: 0x7f9a7e800000 1660:M 05 Feb 2021 22:33:44.283 # Killed by PID: 2122317824, UID: 32666 1660:M 05 Feb 2021 22:33:44.283 # Failed assertion: (:0)

------ STACK TRACE ------ EIP: /usr/bin/redis-server 127.0.0.1:6379(siphash+0x60)[0x559cbde51e40]

Backtrace: /usr/bin/redis-server 127.0.0.1:6379(logStackTrace+0x4f)[0x559cbde13b2f] /usr/bin/redis-server 127.0.0.1:6379(sigsegvHandler+0xd5)[0x559cbde14325] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f9a87285140] /usr/bin/redis-server 127.0.0.1:6379(siphash+0x60)[0x559cbde51e40] /usr/bin/redis-server 127.0.0.1:6379(dictAddRaw+0x2a)[0x559cbddc27fa] /usr/bin/redis-server 127.0.0.1:6379(dictAdd+0x11)[0x559cbddc29c1] /usr/bin/redis-server 127.0.0.1:6379(+0x683e1)[0x559cbddf13e1] /usr/bin/redis-server 127.0.0.1:6379(rdbLoadRio+0x282)[0x559cbddf28a2] /usr/bin/redis-server 127.0.0.1:6379(rdbLoad+0xd2)[0x559cbddf3462] /usr/bin/redis-server 127.0.0.1:6379(loadDataFromDisk+0x9e)[0x559cbddcbe7e] /usr/bin/redis-server 127.0.0.1:6379(main+0x3bd)[0x559cbddba07d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f9a870d2d0a] /usr/bin/redis-server 127.0.0.1:6379(_start+0x2a)[0x559cbddba43a]

------ INFO OUTPUT ------

Server

redis_version:6.0.9 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:74d695619ee2c9cb redis_mode:standalone os:Linux 4.19.0-14-amd64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:10.2.1 process_id:1660 run_id:82eb5b7f8ace6fe0e3a532c2dbe9ffc2b1ddbd50 tcp_port:6379 uptime_in_seconds:25 uptime_in_days:0 hz:10 configured_hz:10 lru_clock:1948057 executable:/usr/bin/redis-server config_file:/etc/redis/redis.conf io_threads_active:0

Clients

connected_clients:0 client_recent_max_input_buffer:0 client_recent_max_output_buffer:0 blocked_clients:0 tracking_clients:0 clients_in_timeout_table:0

Memory

used_memory:5487903672 used_memory_human:5.11G used_memory_rss:0 used_memory_rss_human:0B used_memory_peak:6427426232 used_memory_peak_human:5.99G used_memory_peak_perc:85.38% used_memory_overhead:119385272 used_memory_startup:809672 used_memory_dataset:5368518400 used_memory_dataset_perc:97.84% allocator_allocated:0 allocator_active:0 allocator_resident:0 total_system_memory:16506609664 total_system_memory_human:15.37G used_memory_lua:41984 used_memory_lua_human:41.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:-nan allocator_frag_bytes:0 allocator_rss_ratio:-nan allocator_rss_bytes:0 rss_overhead_ratio:-nan rss_overhead_bytes:0 mem_fragmentation_ratio:-nan mem_fragmentation_bytes:0 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:0 mem_aof_buffer:0 mem_allocator:jemalloc-5.2.1 active_defrag_running:0 lazyfree_pending_objects:0

Persistence

loading:1 rdb_changes_since_last_save:0 rdb_bgsave_in_progress:0 rdb_last_save_time:1612560793 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0 module_fork_in_progress:0 module_fork_last_cow_size:0 loading_start_time:1612560793 loading_total_bytes:1527837039 loading_loaded_bytes:1409850180 loading_loaded_perc:92.28 loading_eta_seconds:2

Stats

total_connections_received:0 total_commands_processed:0 instantaneous_ops_per_sec:0 total_net_input_bytes:0 total_net_output_bytes:0 instantaneous_input_kbps:0.00 instantaneous_output_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 expire_cycle_cpu_milliseconds:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 tracking_total_keys:0 tracking_total_items:0 tracking_total_prefixes:0 unexpected_error_replies:0 total_reads_processed:0 total_writes_processed:0 io_threaded_reads_processed:0 io_threaded_writes_processed:0

Replication

role:master connected_slaves:0 master_replid:ad06260eea9256ff5545b692a75f91e39ede3b55 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:0 second_repl_offset:-1 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0

CPU

used_cpu_sys:0.964259 used_cpu_user:30.028085 used_cpu_sys_children:0.000000 used_cpu_user_children:0.000000

Modules

Commandstats

Cluster

cluster_enabled:0

Keyspace

db0:keys=2116880,expires=8954,avg_ttl=0

------ CLIENT LIST OUTPUT ------

------ REGISTERS ------ 1660:M 05 Feb 2021 22:33:44.284 # RAX:7b141380806422bc RBX:00007f99c1499a40 RCX:cbe48136d6963abe RDX:067953052d48968a RDI:d5daaff3e5ee034b RSI:4000000000000000 RBP:0000000000000000 RSP:00007fffa0d6c7e8 R8 :00007f9a7e7ffff9 R9 :00007f984e174f89 R10:0000000000000000 R11:736f6d6570736575 R12:0000000000000000 R13:00007fffa0d6c8e0 R14:00007f99c1499a40 R15:00007f98bba00149 RIP:0000559cbde51e40 EFL:0000000000010287 CSGSFS:002b000000000033 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f7) -> 00007f99c14c9a40 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f6) -> 0000000000000003 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f5) -> 0000000000000002 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f4) -> 0000559cbddc29c1 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f3) -> 00007fffa0d6ce60 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f2) -> 00007f98bba00149 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f1) -> 00007fffa0d6c8e0 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7f0) -> 0000000000000000 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7ef) -> 0000000000000003 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7ee) -> 00007f99c1499a40 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7ed) -> 00007f99c14c9a40 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7ec) -> 0000559cbddee4c8 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7eb) -> 0000000000000000 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7ea) -> 00007f99c14d3021 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7e9) -> 0000000000000004 1660:M 05 Feb 2021 22:33:44.284 # (00007fffa0d6c7e8) -> 0000559cbddc27fa

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------ 1660:M 05 Feb 2021 22:33:44.285 # Bio thread for job type #0 terminated 1660:M 05 Feb 2021 22:33:44.285 # Bio thread for job type #1 terminated 1660:M 05 Feb 2021 22:33:44.285 # Bio thread for job type #2 terminated *** Preparing to test memory region 559cbdec5000 (118784 bytes) *** Preparing to test memory region 7f98bba00000 (7564427264 bytes) *** Preparing to test memory region 7f9a7e915000 (92274688 bytes) *** Preparing to test memory region 7f9a84116000 (8388608 bytes) *** Preparing to test memory region 7f9a84917000 (8388608 bytes) *** Preparing to test memory region 7f9a85118000 (8388608 bytes) *** Preparing to test memory region 7f9a85919000 (8388608 bytes) *** Preparing to test memory region 7f9a86400000 (8388608 bytes) *** Preparing to test memory region 7f9a86d2a000 (32768 bytes) *** Preparing to test memory region 7f9a86e9b000 (8192 bytes) *** Preparing to test memory region 7f9a870a9000 (12288 bytes) *** Preparing to test memory region 7f9a8726d000 (16384 bytes) *** Preparing to test memory region 7f9a8728f000 (24576 bytes) *** Preparing to test memory region 7f9a87585000 (16384 bytes) *** Preparing to test memory region 7f9a876bb000 (4096 bytes) *** Preparing to test memory region 7f9a878aa000 (2236416 bytes) *** Preparing to test memory region 7f9a87f16000 (4096 bytes) *** Preparing to test memory region 7f9a87f1d000 (8192 bytes) *** Preparing to test memory region 7f9a87f59000 (4096 bytes) .O.

Thanks.

Comment From: madolson

The stack trace you linked shows a database loading, not a steady state issue. Are you only seeing this issue while loading the RDB into a new server?

Comment From: oranagra

@alejandrocobo it may help if you post the crash logs from 3.2 and 5.0 too. I suspect your rdb file might have got corrupted, is this the same rdb file that crashed the other versions? or new one produced by redid 6.0? please try to load it into redis-check-rdb (of the different versions of Redis if that file is compatible) and tell us what you get.

Comment From: alejandrocobo

The stack trace you linked shows a database loading, not a steady state issue. Are you only seeing this issue while loading the RDB into a new server?

That's right. When I upgrade from Redis 3.2 to 5.0, the process redis takes 100% CPU and the log shows the crash over and over again. This happens before I launch Nodebb.

@alejandrocobo it may help if you post the crash logs from 3.2 and 5.0 too. I suspect your rdb file might have got corrupted, is this the same rdb file that crashed the other versions? or new one produced by redid 6.0? please try to load it into redis-check-rdb (of the different versions of Redis if that file is compatible) and tell us what you get.

Yes, I tried the same rdb on the three versions. This is the output of redis-check-rdb dump.rdb on Redis 3.2:

[offset 0] Checking RDB file dump.rdb [offset 26] AUX FIELD redis-ver = '3.2.6' [offset 40] AUX FIELD redis-bits = '64' [offset 52] AUX FIELD ctime = '1612599948' [offset 73] AUX FIELD used-mem = '6524808288' [offset 75] Selecting DB ID 0 [offset 1527681388] Selecting DB ID 1 [offset 1528037819] Selecting DB ID 2 [offset 1528249311] Checksum OK [offset 1528249311] \o/ RDB looks OK! \o/ [info] 2579282 keys read [info] 13681 expires [info] 14 already expired

Redis 3.2 works fine so there are no crash logs from it. This is Redis 5.0.3 crash log:

=== REDIS BUG REPORT START: Cut & paste starting from here === 1401:M 05 Feb 2021 21:54:53.464 # Redis 5.0.3 crashed by signal: 11 1401:M 05 Feb 2021 21:54:53.464 # Crashed running the instruction at: 0x5561c92f8350 1401:M 05 Feb 2021 21:54:53.464 # Accessing address: 0x7f7e86400000 1401:M 05 Feb 2021 21:54:53.464 # Failed assertion: (:0)

------ STACK TRACE ------ EIP: /usr/bin/redis-server 127.0.0.1:6379(siphash+0x60)[0x5561c92f8350]

Backtrace: /usr/bin/redis-server 127.0.0.1:6379(logStackTrace+0x45)[0x5561c92c4765] /usr/bin/redis-server 127.0.0.1:6379(sigsegvHandler+0xb1)[0x5561c92c4f01] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f7e9abcc730] /usr/bin/redis-server 127.0.0.1:6379(siphash+0x60)[0x5561c92f8350] /usr/bin/redis-server 127.0.0.1:6379(dictAddRaw+0x31)[0x5561c9279c41] /usr/bin/redis-server 127.0.0.1:6379(dictAdd+0x11)[0x5561c9279e21] /usr/bin/redis-server 127.0.0.1:6379(+0x51fef)[0x5561c92a0fef] /usr/bin/redis-server 127.0.0.1:6379(rdbLoadRio+0x291)[0x5561c92a20f1] /usr/bin/redis-server 127.0.0.1:6379(rdbLoad+0x52)[0x5561c92a2782] /usr/bin/redis-server 127.0.0.1:6379(loadDataFromDisk+0x8c)[0x5561c928219c] /usr/bin/redis-server 127.0.0.1:6379(main+0x434)[0x5561c9272364] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f7e9aa1d09b] /usr/bin/redis-server 127.0.0.1:6379(_start+0x2a)[0x5561c927264a]

------ INFO OUTPUT ------

Server

redis_version:5.0.3 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:1b271fe49834c463 redis_mode:standalone os:Linux 4.19.0-14-amd64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:8.3.0 process_id:1401 run_id:035f3447d2061503b5120d84405d4f51b77d513a tcp_port:6379 uptime_in_seconds:27 uptime_in_days:0 hz:10 configured_hz:10 lru_clock:1945724 executable:/usr/bin/redis-server config_file:/etc/redis/redis.conf

Clients

connected_clients:0 client_recent_max_input_buffer:0 client_recent_max_output_buffer:0 blocked_clients:0

Memory

used_memory:5491251784 used_memory_human:5.11G used_memory_rss:0 used_memory_rss_human:0B used_memory_peak:6430774856 used_memory_peak_human:5.99G used_memory_peak_perc:85.39% used_memory_overhead:119372040 used_memory_startup:795736 used_memory_dataset:5371879744 used_memory_dataset_perc:97.84% allocator_allocated:0 allocator_active:0 allocator_resident:0 total_system_memory:16506609664 total_system_memory_human:15.37G used_memory_lua:41984 used_memory_lua_human:41.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:-nan allocator_frag_bytes:0 allocator_rss_ratio:-nan allocator_rss_bytes:0 rss_overhead_ratio:-nan rss_overhead_bytes:0 mem_fragmentation_ratio:-nan mem_fragmentation_bytes:0 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:0 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0

Persistence

loading:1 rdb_changes_since_last_save:0 rdb_bgsave_in_progress:0 rdb_last_save_time:1612558460 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0 loading_start_time:1612558460 loading_total_bytes:1527837039 loading_loaded_bytes:1409850180 loading_loaded_perc:92.28 loading_eta_seconds:2

Stats

total_connections_received:0 total_commands_processed:0 instantaneous_ops_per_sec:0 total_net_input_bytes:0 total_net_output_bytes:0 instantaneous_input_kbps:0.00 instantaneous_output_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0

Replication

role:master connected_slaves:0 master_replid:43c5454630bf5dc30952cc4e0b176b8804045bb8 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:0 second_repl_offset:-1 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0

CPU

used_cpu_sys:0.944020 used_cpu_user:32.244711 used_cpu_sys_children:0.000000 used_cpu_user_children:0.000000

Commandstats

Cluster

cluster_enabled:0

Keyspace

db0:keys=2116891,expires=8965,avg_ttl=0

------ CLIENT LIST OUTPUT ------

------ REGISTERS ------ 1401:M 05 Feb 2021 21:54:53.464 # RAX:fdc0b8081fc1f263 RBX:784832e416bb3245 RCX:90509caa5cf81b8d RDX:e72257f7f1a57aae RDI:00007f7cd1200609 RSI:4000000000000000 RBP:0000000000000003 RSP:00007ffe7ca22f30 R8 :00007f7e863ffff9 R9 :0000000000000000 R10:00007f7c63975449 R11:0000000000000000 R12:0000000000000000 R13:0000000000000000 R14:00007f7dd5b85640 R15:00007ffe7ca23550 RIP:00005561c92f8350 EFL:0000000000010287 CSGSFS:002b000000000033 1401:M 05 Feb 2021 21:54:53.464 # (00007ffe7ca22f3f) -> 0000000000000003 1401:M 05 Feb 2021 21:54:53.464 # (00007ffe7ca22f3e) -> 0000000000000002 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f3d) -> 00005561c9279e21 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f3c) -> 00007ffe7ca23550 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f3b) -> 00007f7dd5bde140 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f3a) -> 00007f7cd1200609 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f39) -> 0000000000000000 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f38) -> 0000000000000003 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f37) -> 00007f7dd5b85640 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f36) -> 00007ffe7ca23550 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f35) -> 00005561c929def0 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f34) -> 0000000000000004 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f33) -> 00007f7dd5bde140 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f32) -> 0000000000000000 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f31) -> 00005561c9279c41 1401:M 05 Feb 2021 21:54:53.465 # (00007ffe7ca22f30) -> 00007f7cd1200609

------ FAST MEMORY TEST ------ 1401:M 05 Feb 2021 21:54:53.465 # Bio thread for job type #0 terminated 1401:M 05 Feb 2021 21:54:53.465 # Bio thread for job type #1 terminated 1401:M 05 Feb 2021 21:54:53.465 # Bio thread for job type #2 terminated *** Preparing to test memory region 5561c9342000 (94208 bytes) *** Preparing to test memory region 7f7ccfe00000 (7354712064 bytes) *** Preparing to test memory region 7f7e86518000 (301989888 bytes) *** Preparing to test memory region 7f7e98519000 (8388608 bytes) *** Preparing to test memory region 7f7e98d1a000 (8388608 bytes) *** Preparing to test memory region 7f7e9951b000 (8388608 bytes) *** Preparing to test memory region 7f7e9a000000 (8388608 bytes) *** Preparing to test memory region 7f7e9a851000 (32768 bytes) *** Preparing to test memory region 7f7e9a873000 (8192 bytes) *** Preparing to test memory region 7f7e9a9f5000 (16384 bytes) *** Preparing to test memory region 7f7e9abb6000 (16384 bytes) *** Preparing to test memory region 7f7e9abd7000 (16384 bytes) *** Preparing to test memory region 7f7e9ad7a000 (8192 bytes) *** Preparing to test memory region 7f7e9ae1b000 (2228224 bytes) *** Preparing to test memory region 7f7e9b67d000 (4096 bytes) *** Preparing to test memory region 7f7e9b683000 (8192 bytes) *** Preparing to test memory region 7f7e9b6be000 (4096 bytes) .O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------ Symbol: siphash (base: 0x5561c92f82f0) Module: /usr/bin/redis-server 127.0.0.1:6379 (base 0x5561c924f000) $ xxd -r -p /tmp/dump.hex /tmp/dump.bin $ objdump --adjust-vma=0x5561c92f82f0 -D -b binary -m i386:x86-64 /tmp/dump.bin


1401:M 05 Feb 2021 21:55:12.301 # dump of function (hexdump of 224 bytes): 534989f2488b1a49b86d6f646e61726f64488b52084983e2f84189f348c1e63848b873657479626465744901fa4183e30748b96172656e6567796c4831d04c31c24831d949b875657370656d6f734c31c34c39d7745c4989f80f1f80000000004d8b084801d348c1c20d4983c0084831da48c1c3204c31c84801c148c1c0104831c84801d148c1c2114801c348c1c0154831ca4831d848c1c1204c31cb4d39c275be4929fa4d8d42f849c1e8034e8d54c7084c8d05d77302004b633c984c01c7ffe7660f1f440000410fb67a0648c1e7304809fe410fb67a0548c1e7284809fe

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

Thanks.

Comment From: alejandrocobo

@alejandrocobo it may help if you post the crash logs from 3.2 and 5.0 too. I suspect your rdb file might have got corrupted, is this the same rdb file that crashed the other versions? or new one produced by redid 6.0? please try to load it into redis-check-rdb (of the different versions of Redis if that file is compatible) and tell us what you get.

If I run redis-check-rdb on Redis 6.0.9, there are errors. Sorry for the image:

IMG_20210206_094719

How can I fix this?

Thanks.

Comment From: oranagra

i see Killed by PID: 2122317824, UID: 32666, please check your dmesg and other logs, maybe someone killed it due to overcoming some limit (these usually kill with -9 rather than -11, but let's check).

It crashes in redis-rdb-check too, which doesn't usually consume a lot of memory, but maybe that key (seen in the above screenshot) is huge?

what's in 0x559cbddf13e1? gdb -s redis-server -ex "list *0x559cbddf13e1" -batch -q

Comment From: alejandrocobo

i see Killed by PID: 2122317824, UID: 32666, please check your dmesg and other logs, maybe someone killed it due to overcoming some limit (these usually kill with -9 rather than -11, but let's check).

It crashes in redis-rdb-check too, which doesn't usually consume a lot of memory, but maybe that key (seen in the above screenshot) is huge?

what's in 0x559cbddf13e1? gdb -s redis-server -ex "list *0x559cbddf13e1" -batch -q

dmesg shows nothing related with Redis. journalctl -xe says this:

-- A start job for unit redis-server.service has finished with a failure.

-- The job identifier is 792 and the job result is failed. feb 08 09:47:17 servidor systemd[1]: redis-server.service: Service RestartSec=100ms expired, scheduling restart. feb 08 09:47:17 servidor systemd[1]: redis-server.service: Scheduled restart job, restart counter is at 6. -- Subject: Automatic restarting of a unit has been scheduled -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- Automatic restarting of the unit redis-server.service has been scheduled, as the result for -- the configured Restart= setting for the unit. feb 08 09:47:17 servidor systemd[1]: Stopped Advanced key-value store. -- Subject: A stop job for unit redis-server.service has finished -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- A stop job for unit redis-server.service has finished. -- -- The job identifier is 852 and the job result is done. feb 08 09:47:17 servidor systemd[1]: Starting Advanced key-value store... -- Subject: A start job for unit redis-server.service has begun execution -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- A start job for unit redis-server.service has begun execution.

And systemctl status redis-server.service shows this: ● redis-server.service - Advanced key-value store Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled) Active: activating (start) since Mon 2021-02-08 09:49:27 CET; 37s ago Docs: http://redis.io/documentation, man:redis-server(1) Main PID: 1829 (redis-server) Status: "Redis is loading..." Tasks: 5 (limit: 4915) Memory: 3.9G CGroup: /system.slice/redis-server.service └─1829 /usr/bin/redis-server 127.0.0.1:6379

feb 08 09:49:27 servidor systemd[1]: Starting Advanced key-value store...

About 0x559cbddf13e1, that address changes in each execution so I don't know if its content is relevant.

Thanks.

Comment From: oranagra

@alejandrocobo i don't know how to proceed, maybe that Killed by PID is false, or maybe we just didn't find the log file of the killer yet. one way to try to uncover more info is maybe to try redis-check-rdb of more redis versions, specifically 4.0.14, but maybe also 3.2.13, 5.0.10, 6.2-RC3.

p.s. when you tested redis-check-rdb of 3.2 and 6.0.9 did you do that on the same system, or different systems. maybe the difference here is not the version of redis being used, but something in the OS or it's configuration.

Comment From: alejandrocobo

So, I've being doing more testings. In order to discard any platform related issue, I've done all tests on Debian 11.1. Redis has been compiled from the sources.

These are the results with the following versions: · Redis 3.2.13: Works fine. · Redis 4.0.0: Crashes. · Redis 4.0.14: Crashes. · Redis 6.0.16 (the default version from Debian 11 installed from their official repositories): Crashes. · Redis 6.2.6: Crashes.

To summarize past testings: · Redis 3.2.6 (from Debian 9 official repositories): Works fine. · Redis 5.0.3 (from Debian 10 official repositories): Crashes. · Redis 6.0.9 (from Debian 11 testing official repositories): Crashes.

So, something happened from Redis 3.2.13 to 4.0.0 that now it crashes. If this is useful: the dump.rdb file size is 1.6GB and Redis 3.2.x takes around 8GB of RAM when running.

Thanks.

Comment From: oranagra

re-reading my last post, i now know that when si_code is non-zero, it means that the process died on it's own (not killed by anyone), so we can ignore the Killed by PID part (new versions of redis don't print that anymore).

you say that the problem started somewhere between 3.2.13 and 4.0.0, but you said 6.0.9 works fine. can you explain that?

is it the same dataset (rdb file) that's been used to reproduce all of these? or starting from fresh on each one?

Comment From: alejandrocobo

you say that the problem started somewhere between 3.2.13 and 4.0.0, but you said 6.0.9 works fine. can you explain that?

I'm sorry, I made a mistake. 6.0.9 doesn't work. The only working versions are 3.2.6 and 3.2.13.

Thanks.

Comment From: alejandrocobo

is it the same dataset (rdb file) that's been used to reproduce all of these? or starting from fresh on each one?

I've used the same dataset for all tests (the 1.6GB file I mention above). In fact for version 4.0.0 I used the original dataset kept by version 3.2.6 (the one I tryed in all tests) and the one dumped by 3.2.13 version with the same result.

Comment From: oranagra

i think the next thing to do is to try running redis under valgrind and see if it reports any memory corruption at an earlier stage (before that key that crashes it). another alternative is that maybe you can share the rdb file with me (via email) and i'll try to debug it myself.

Comment From: alejandrocobo

i think the next thing to do is to try running redis under valgrind and see if it reports any memory corruption at an earlier stage (before that key that crashes it). another alternative is that maybe you can share the rdb file with me (via email) and i'll try to debug it myself.

Sorry for the late answer. First of all, I've upgraded the memory so the system RAM is new and have the exact same issue. Here you have valgrind output. If you need I run it with a specific flag, please, tell me:

# valgrind redis-server
==2408== Memcheck, a memory error detector
==2408== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2408== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2408== Command: redis-server
==2408== 
2408:C 01 Dec 2021 20:01:23.299 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2408:C 01 Dec 2021 20:01:23.318 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=2408, just started
2408:C 01 Dec 2021 20:01:23.318 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
2408:M 01 Dec 2021 20:01:23.363 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
2408:M 01 Dec 2021 20:01:23.368 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
2408:M 01 Dec 2021 20:01:23.368 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
2408:M 01 Dec 2021 20:01:23.368 * monotonic clock: POSIX clock_gettime
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 6.2.6 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 2408
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           https://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

2408:M 01 Dec 2021 20:01:23.574 # Server initialized
2408:M 01 Dec 2021 20:01:23.726 * Loading RDB produced by version 3.2.13
2408:M 01 Dec 2021 20:01:23.731 * RDB age 1915830 seconds
2408:M 01 Dec 2021 20:01:23.739 * RDB memory usage when created 6005.93 Mb
==2408== Warning: set address range perms: large range [0x69caa000, 0x7dcaa000) (defined)
==2408== Warning: set address range perms: large range [0x7dcaa000, 0x95caa000) (defined)
==2408== Warning: set address range perms: large range [0x95caa000, 0xd5caa000) (defined)
==2408== Warning: set address range perms: large range [0xd5caa000, 0x195caa000) (defined)
==2408== Invalid read of size 8
==2408==    at 0x1EB680: siphash (siphash.c:138)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==    by 0x15AD3C: loadDataFromDisk (server.c:5934)
==2408==    by 0x149F60: main (server.c:6362)
==2408==  Address 0x195ca9ff9 is in a rw- anonymous segment
==2408== 


=== REDIS BUG REPORT START: Cut & paste starting from here ===
2408:M 01 Dec 2021 20:12:21.825 # Redis 6.2.6 crashed by signal: 11, si_code: 1
2408:M 01 Dec 2021 20:12:21.826 # Accessing address: 0x195caa000
2408:M 01 Dec 2021 20:12:21.827 # Crashed running the instruction at: 0x1eb680

------ STACK TRACE ------
EIP:
redis-server *:6379(siphash+0x60)[0x1eb680]

Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x49ba140]
redis-server *:6379(siphash+0x60)[0x1eb680]
redis-server *:6379(dictAddRaw+0x2a)[0x14f6fa]
redis-server *:6379(dictAdd+0x11)[0x14f9b1]
redis-server *:6379(rdbLoadObject+0x36b)[0x1828ab]
redis-server *:6379(rdbLoadRio+0x27e)[0x1846ce]
redis-server *:6379(rdbLoad+0x80)[0x185410]
redis-server *:6379(loadDataFromDisk+0x8d)[0x15ad3d]
redis-server *:6379(main+0x2b1)[0x149f61]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x49eed0a]
redis-server *:6379(_start+0x2a)[0x14a49a]

------ REGISTERS ------
2408:M 01 Dec 2021 20:12:21.867 # 
RAX:ee850e3784b37546 RBX:0000000083bd5cc0
RCX:be904f405ec118fd RDX:5b1fab03c602c623
RDI:9cb98dbf0d3615eb RSI:4000000000000000
RBP:0000000000000000 RSP:0000001ffefff708
R8 :0000000195ca9ff9 R9 :000000006841f649
R10:0000000000000000 R11:736f6d6570736575
R12:0000000000000000 R13:0000001ffefff800
R14:0000000083bd5cc0 R15:00000000d5caa809
RIP:00000000001eb680 EFL:0000000000000085
CSGSFS:0000000000000000
2408:M 01 Dec 2021 20:12:21.867 # (0000001ffefff717) -> 0000000000000003
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff716) -> 00000000d5caa809
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff715) -> 0000000000000001
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff714) -> 000000000014f9b1
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff713) -> 0000000083b944a0
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff712) -> 0000001ffeffff20
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff711) -> 0000001ffefff800
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff710) -> 0000000000000000
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff70f) -> 00000000d5caa809
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff70e) -> 0000000083bd5cc0
==2408== Use of uninitialised value of size 8
==2408==    at 0x4A18E9A: _itoa_word (_itoa.c:180)
==2408==    by 0x4A328B7: __vfprintf_internal (vfprintf-internal.c:1687)
==2408==    by 0x4A459C5: __vsnprintf_internal (vsnprintf.c:114)
==2408==    by 0x151A66: _serverLog (server.c:1173)
==2408==    by 0x1A638C: logStackContent (debug.c:1091)
==2408==    by 0x1A638C: logRegisters (debug.c:1266)
==2408==    by 0x1A945A: sigsegvHandler (debug.c:1828)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408== 
==2408== Conditional jump or move depends on uninitialised value(s)
==2408==    at 0x4A18EAC: _itoa_word (_itoa.c:180)
==2408==    by 0x4A328B7: __vfprintf_internal (vfprintf-internal.c:1687)
==2408==    by 0x4A459C5: __vsnprintf_internal (vsnprintf.c:114)
==2408==    by 0x151A66: _serverLog (server.c:1173)
==2408==    by 0x1A638C: logStackContent (debug.c:1091)
==2408==    by 0x1A638C: logRegisters (debug.c:1266)
==2408==    by 0x1A945A: sigsegvHandler (debug.c:1828)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408== 
==2408== Conditional jump or move depends on uninitialised value(s)
==2408==    at 0x4A33660: __vfprintf_internal (vfprintf-internal.c:1687)
==2408==    by 0x4A459C5: __vsnprintf_internal (vsnprintf.c:114)
==2408==    by 0x151A66: _serverLog (server.c:1173)
==2408==    by 0x1A638C: logStackContent (debug.c:1091)
==2408==    by 0x1A638C: logRegisters (debug.c:1266)
==2408==    by 0x1A945A: sigsegvHandler (debug.c:1828)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408== 
==2408== Conditional jump or move depends on uninitialised value(s)
==2408==    at 0x4A32A2E: __vfprintf_internal (vfprintf-internal.c:1687)
==2408==    by 0x4A459C5: __vsnprintf_internal (vsnprintf.c:114)
==2408==    by 0x151A66: _serverLog (server.c:1173)
==2408==    by 0x1A638C: logStackContent (debug.c:1091)
==2408==    by 0x1A638C: logRegisters (debug.c:1266)
==2408==    by 0x1A945A: sigsegvHandler (debug.c:1828)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408== 
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff70d) -> 0000000000000000
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff70c) -> 0000001ffeffff20
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff70b) -> 0000000000000004
2408:M 01 Dec 2021 20:12:21.868 # (0000001ffefff70a) -> 0000000000000003
2408:M 01 Dec 2021 20:12:21.869 # (0000001ffefff709) -> 0000001ffeffff20
2408:M 01 Dec 2021 20:12:21.869 # (0000001ffefff708) -> 000000000014f6fa

------ INFO OUTPUT ------
# Server
redis_version:6.2.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:fc1c24e4f3e196ea
redis_mode:standalone
os:Linux 5.10.0-9-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.2.1
process_id:2408
process_supervised:no
run_id:a14d10455d5e64342b1b3ed335f4ccfe81c9e715
tcp_port:6379
server_time_usec:1638385863148484
uptime_in_seconds:580
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:10995331
executable:/home/alex/redis-6.2.6/redis-server
config_file:
io_threads_active:0

# Clients
connected_clients:0
cluster_connections:0
maxclients:992
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:4643085720
used_memory_human:4.32G
used_memory_rss:3277111296
used_memory_rss_human:3.05G
used_memory_peak:5582608024
used_memory_peak_human:5.20G
used_memory_peak_perc:83.17%
used_memory_overhead:99698280
used_memory_startup:334712
used_memory_dataset:4543387440
used_memory_dataset_perc:97.86%
allocator_allocated:5582713416
allocator_active:5588140032
allocator_resident:5626908672
total_system_memory:33601101824
total_system_memory_human:31.29G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:5426616
allocator_rss_ratio:1.01
allocator_rss_bytes:38768640
rss_overhead_ratio:0.58
rss_overhead_bytes:-2349797376
mem_fragmentation_ratio:0.59
mem_fragmentation_bytes:-2305496728
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:1
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1638385283
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
loading_start_time:1638385283
loading_total_bytes:1557154281
loading_rdb_used_mem:6297669704
loading_loaded_bytes:1323923888
loading_loaded_perc:85.02
loading_eta_seconds:115

# Stats
total_connections_received:0
total_commands_processed:0
instantaneous_ops_per_sec:0
total_net_input_bytes:0
total_net_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:0
total_writes_processed:0
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:3ae51dff85b8a21a60550a2dcacdc78dc0059099
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:4.177800
used_cpu_user:634.589498
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:4.007690
used_cpu_user_main_thread:634.195000

# Modules

# Commandstats

# Errorstats

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=1619014,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
2408:M 01 Dec 2021 20:12:21.920 # Bio thread for job type #0 terminated
2408:M 01 Dec 2021 20:12:21.920 # Bio thread for job type #1 terminated
2408:M 01 Dec 2021 20:12:21.921 # Bio thread for job type #2 terminated
*** Preparing to test memory region 315000 (2281472 bytes)
*** Preparing to test memory region 402c000 (4096 bytes)
*** Preparing to test memory region 402d000 (4096 bytes)
*** Preparing to test memory region 4832000 (8192 bytes)
*** Preparing to test memory region 49c4000 (16384 bytes)
*** Preparing to test memory region 4b89000 (40960 bytes)
*** Preparing to test memory region 4c00000 (8388608 bytes)
*** Preparing to test memory region 5400000 (4194304 bytes)
*** Preparing to test memory region 5ae8000 (8388608 bytes)
*** Preparing to test memory region 62e9000 (8388608 bytes)
*** Preparing to test memory region 6aea000 (8388608 bytes)
*** Preparing to test memory region 72eb000 (8388608 bytes)
*** Preparing to test memory region 7aeb000 (402653184 bytes)
*** Preparing to test memory region 1fc00000 (893386752 bytes)
*** Preparing to test memory region 582a6000 (27279360 bytes)
*** Preparing to test memory region 59caa000 (5301600256 bytes)
*** Preparing to test memory region 1002001000 (11235328 bytes)
*** Preparing to test memory region 1002ab9000 (786432 bytes)
*** Preparing to test memory region 1002b7c000 (5795840 bytes)
*** Preparing to test memory region 1003105000 (1048576 bytes)
*** Preparing to test memory region 1003237000 (229376 bytes)
*** Preparing to test memory region 1003271000 (9363456 bytes)
*** Preparing to test memory region 1003b61000 (5648384 bytes)
*** Preparing to test memory region 10040c6000 (1048576 bytes)
*** Preparing to test memory region 10041c8000 (1359872 bytes)
*** Preparing to test memory region 1004316000 (29528064 bytes)
*** Preparing to test memory region 1005f41000 (1048576 bytes)
*** Preparing to test memory region 1006043000 (1474560 bytes)
*** Preparing to test memory region 10061ad000 (1048576 bytes)
*** Preparing to test memory region 10062af000 (1114112 bytes)
*** Preparing to test memory region 10063c1000 (3080192 bytes)
*** Preparing to test memory region 10066b3000 (1048576 bytes)
*** Preparing to test memory region 10067b6000 (916590592 bytes)
*** Preparing to test memory region 1ffeffc000 (20480 bytes)
.O.O.O.O.O.O.O.==2408== Invalid read of size 8
==2408==    at 0x483CE1F: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1032)
==2408==    by 0x1C0797: memtest_preserving_test (memtest.c:305)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==  Address 0x5400000 is in a rwx anonymous segment
==2408== 
==2408== Invalid read of size 8
==2408==    at 0x483CE37: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1032)
==2408==    by 0x1C0797: memtest_preserving_test (memtest.c:305)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==  Address 0x5400008 is in a rwx anonymous segment
==2408== 
==2408== Invalid read of size 8
==2408==    at 0x483CE3F: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1032)
==2408==    by 0x1C0797: memtest_preserving_test (memtest.c:305)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==  Address 0x5400010 is in a rwx anonymous segment
==2408== 
==2408== Invalid read of size 8
==2408==    at 0x483CE47: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1032)
==2408==    by 0x1C0797: memtest_preserving_test (memtest.c:305)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==  Address 0x5400018 is in a rwx anonymous segment
==2408== 
==2408== Invalid read of size 8
==2408==    at 0x483CE2C: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1032)
==2408==    by 0x1C0797: memtest_preserving_test (memtest.c:305)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==  Address 0x5400020 is 32 bytes before a block of size 16 in arena "client"
==2408== 
==2408== Invalid write of size 8
==2408==    at 0x1C07E0: memtest_addressing (memtest.c:100)
==2408==    by 0x1C07E0: memtest_preserving_test (memtest.c:308)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==    by 0x15AD3C: loadDataFromDisk (server.c:5934)
==2408==  Address 0x5400000 is in a rwx anonymous segment
==2408== 
==2408== Invalid write of size 8
==2408==    at 0x1C07EA: memtest_addressing (memtest.c:99)
==2408==    by 0x1C07EA: memtest_preserving_test (memtest.c:308)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==    by 0x15AD3C: loadDataFromDisk (server.c:5934)
==2408==  Address 0x5400010 is in a rwx anonymous segment
==2408== 
==2408== Invalid read of size 8
==2408==    at 0x1C07EF: memtest_addressing (memtest.c:99)
==2408==    by 0x1C07EF: memtest_preserving_test (memtest.c:308)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==    by 0x15AD3C: loadDataFromDisk (server.c:5934)
==2408==  Address 0x5400000 is in a rwx anonymous segment
==2408== 
==2408== Invalid read of size 8
==2408==    at 0x1C0801: memtest_addressing (memtest.c:108)
==2408==    by 0x1C0801: memtest_preserving_test (memtest.c:308)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==    by 0x15AD3C: loadDataFromDisk (server.c:5934)
==2408==  Address 0x5400008 is in a rwx anonymous segment
==2408== 
==2408== Conditional jump or move depends on uninitialised value(s)
==2408==    at 0x1C0804: memtest_addressing (memtest.c:108)
==2408==    by 0x1C0804: memtest_preserving_test (memtest.c:308)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==    by 0x15AD3C: loadDataFromDisk (server.c:5934)
==2408== 

valgrind: m_mallocfree.c:303 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 88080416, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.


host stacktrace:
==2408==    at 0x58047B6A: show_sched_status_wrk (m_libcassert.c:406)
==2408==    by 0x58047C87: report_and_quit (m_libcassert.c:477)
==2408==    by 0x58047E17: vgPlain_assert_fail (m_libcassert.c:543)
==2408==    by 0x580522E4: get_bszB_as_is (m_mallocfree.c:303)
==2408==    by 0x580522E4: is_inuse_block (m_mallocfree.c:331)
==2408==    by 0x580522E4: vgPlain_describe_arena_addr (m_mallocfree.c:1592)
==2408==    by 0x5803ECEA: vgPlain_describe_addr (m_addrinfo.c:185)
==2408==    by 0x5803DB3F: vgMemCheck_update_Error_extra (mc_errors.c:1185)
==2408==    by 0x58042BF4: vgPlain_maybe_record_error (m_errormgr.c:822)
==2408==    by 0x5803CFA8: vgMemCheck_record_address_error (mc_errors.c:765)
==2408==    by 0x1004A4F6B4: ???
==2408==    by 0x1003204F2F: ???
==2408==    by 0x1003204F17: ???
==2408==    by 0x1003204F2F: ???
==2408==    by 0x1003204F3F: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 2408)
==2408==    at 0x1BFEF8: memtest_fill_random (memtest.c:152)
==2408==    by 0x1C081B: memtest_preserving_test (memtest.c:309)
==2408==    by 0x1A6A3C: memtest_test_linux_anonymous_maps (debug.c:1695)
==2408==    by 0x1A6AF8: doFastMemoryTest (debug.c:1736)
==2408==    by 0x1A946E: printCrashReport (debug.c:1852)
==2408==    by 0x1A946E: sigsegvHandler (debug.c:1831)
==2408==    by 0x49BA13F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2408==    by 0x1EB67F: siphash (siphash.c:136)
==2408==    by 0x14F6F9: dictAddRaw (dict.c:329)
==2408==    by 0x14F9B0: dictAdd (dict.c:294)
==2408==    by 0x1828AA: rdbLoadObject (rdb.c:1626)
==2408==    by 0x1846CD: rdbLoadRio (rdb.c:2630)
==2408==    by 0x18540F: rdbLoad (rdb.c:2761)
==2408==    by 0x15AD3C: loadDataFromDisk (server.c:5934)
==2408==    by 0x149F60: main (server.c:6362)
client stack range: [0x1FFEEFD000 0x1FFF000FFF] client SP: 0x1FFEEFD640
valgrind stack range: [0x1003105000 0x1003204FFF] top usage: 18472 of 1048576

Thread 5: status = VgTs_WaitSys syscall 202 (lwpid 2412)
==2408==    at 0x49B5AD8: futex_abstimed_wait_cancelable (futex-internal.h:323)
==2408==    by 0x49B5AD8: __pthread_cond_wait_common (pthread_cond_wait.c:520)
==2408==    by 0x49B5AD8: pthread_cond_timedwait@@GLIBC_2.3.2 (pthread_cond_wait.c:656)
==2408==    by 0x2414D8: background_thread_sleep (background_thread.c:257)
==2408==    by 0x2414D8: background_work_sleep_once (background_thread.c:309)
==2408==    by 0x2414D8: background_thread0_work (background_thread.c:454)
==2408==    by 0x2414D8: background_work (background_thread.c:492)
==2408==    by 0x2414D8: background_thread_entry (background_thread.c:522)
==2408==    by 0x49AEEA6: start_thread (pthread_create.c:477)
==2408==    by 0x4AC5DEE: clone (clone.S:95)
client stack range: [0x72EB000 0x7AE8FFF] client SP: 0x7AE8420
valgrind stack range: [0x10066B3000 0x10067B2FFF] top usage: 3752 of 1048576

Comment From: oranagra

Hi, I have some suspicion about a some corruption in the rdb file. in order to get more details, i made a patch (based on redis 6.2.6) for you to try (assuming you're willing build redis, and try to reproduce). debug_patch.diff.gz

looking at the crash logs i realize that it's not always the same rdb file (i see different key count in each of them), so maybe this keeps happening for you on some cluster, but if you can grab one rdb file aside and try ot on 6.2.6 with my patch, maybe we'll get better understanding of what's going on here.

Comment From: alejandrocobo

Hi, I have some suspicion about a some corruption in the rdb file. in order to get more details, i made a patch (based on redis 6.2.6) for you to try (assuming you're willing build redis, and try to reproduce). debug_patch.diff.gz

looking at the crash logs i realize that it's not always the same rdb file (i see different key count in each of them), so maybe this keeps happening for you on some cluster, but if you can grab one rdb file aside and try ot on 6.2.6 with my patch, maybe we'll get better understanding of what's going on here.

This database is in production right now so the file have changed since I reported the bug months ago. The testings are being performed in a clean installation of Debian 11.

I've applied your patch and this is the result:

# redis-server
7484:C 02 Dec 2021 19:42:27.676 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7484:C 02 Dec 2021 19:42:27.677 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=7484, just started
7484:C 02 Dec 2021 19:42:27.677 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
7484:M 02 Dec 2021 19:42:27.677 * Increased maximum number of open files to 10032 (it was originally set to 1024).
7484:M 02 Dec 2021 19:42:27.677 * monotonic clock: POSIX clock_gettime
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 6.2.6 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 7484
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           https://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

7484:M 02 Dec 2021 19:42:27.677 # Server initialized
7484:M 02 Dec 2021 19:42:27.678 * Loading RDB produced by version 3.2.13
7484:M 02 Dec 2021 19:42:27.678 * RDB age 2001094 seconds
7484:M 02 Dec 2021 19:42:27.678 * RDB memory usage when created 6005.93 Mb


=== REDIS BUG REPORT START: Cut & paste starting from here ===
7484:M 02 Dec 2021 19:43:16.303 # Redis 6.2.6 crashed by signal: 11, si_code: 1
7484:M 02 Dec 2021 19:43:16.303 # Accessing address: 0x7ff992e00000
7484:M 02 Dec 2021 19:43:16.303 # Crashed running the instruction at: 0x560dee398780

------ STACK TRACE ------
EIP:
redis-server *:6379(siphash+0x60)[0x560dee398780]

Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7ff9adcba140]
redis-server *:6379(siphash+0x60)[0x560dee398780]
redis-server *:6379(dictAddRaw+0x2a)[0x560dee2fc6fa]
redis-server *:6379(dictAdd+0x11)[0x560dee2fc9b1]
redis-server *:6379(rdbLoadObject+0x36b)[0x560dee32f8cb]
redis-server *:6379(rdbLoadRio+0x2b4)[0x560dee3317c4]
redis-server *:6379(rdbLoad+0x80)[0x560dee332500]
redis-server *:6379(loadDataFromDisk+0x8d)[0x560dee307d3d]
redis-server *:6379(main+0x2b1)[0x560dee2f6f61]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff9adb07d0a]
redis-server *:6379(_start+0x2a)[0x560dee2f749a]

------ REGISTERS ------
7484:M 02 Dec 2021 19:43:16.304 # 
RAX:e809a6ff606e8ec9 RBX:00007ff92858a640
RCX:c3d9003d46c2f178 RDX:ddbc821027c87d00
RDI:a71caf44533340ad RSI:4000000000000000
RBP:0000000000000000 RSP:00007ffc99c41998
R8 :00007ff992dffff9 R9 :00007ff7b4175549
R10:0000000000000000 R11:736f6d6570736575
R12:0000000000000000 R13:00007ffc99c41a90
R14:00007ff92858a640 R15:00007ff821a00709
RIP:0000560dee398780 EFL:0000000000010287
CSGSFS:002b000000000033
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a7) -> 0000000000000003
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a6) -> 00007ff821a00709
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a5) -> 0000000000000001
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a4) -> 0000560dee2fc9b1
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a3) -> 00007ff9285c34a0
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a2) -> 00007ffc99c421b0
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a1) -> 00007ffc99c41a90
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c419a0) -> 0000000000000000
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c4199f) -> 00007ff821a00709
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c4199e) -> 00007ff92858a640
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c4199d) -> 0000000000000000
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c4199c) -> 00007ffc99c421b0
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c4199b) -> 0000000000000004
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c4199a) -> 0000000000000003
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c41999) -> 00007ffc99c421b0
7484:M 02 Dec 2021 19:43:16.304 # (00007ffc99c41998) -> 0000560dee2fc6fa

------ INFO OUTPUT ------
# Server
redis_version:6.2.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:48a9913a3ba65f39
redis_mode:standalone
os:Linux 5.10.0-9-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.2.1
process_id:7484
process_supervised:no
run_id:9afa8dbefc4ff019c87e84e16c5b1e8b32215fe7
tcp_port:6379
server_time_usec:1638470590831718
uptime_in_seconds:43
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:11080595
executable:/home/alex/redis-6.2.6-p1/redis-server
config_file:
io_threads_active:0

# Clients
connected_clients:0
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:4643551672
used_memory_human:4.32G
used_memory_rss:2893860864
used_memory_rss_human:2.70G
used_memory_peak:5583073976
used_memory_peak_human:5.20G
used_memory_peak_perc:83.17%
used_memory_overhead:100173416
used_memory_startup:809848
used_memory_dataset:4543378256
used_memory_dataset_perc:97.86%
allocator_allocated:5583194024
allocator_active:5588660224
allocator_resident:5625307136
total_system_memory:33601101824
total_system_memory_human:31.29G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:5466200
allocator_rss_ratio:1.01
allocator_rss_bytes:36646912
rss_overhead_ratio:0.51
rss_overhead_bytes:-2731446272
mem_fragmentation_ratio:0.52
mem_fragmentation_bytes:-2689213112
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:1
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1638470547
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
loading_start_time:1638470547
loading_total_bytes:1557154281
loading_rdb_used_mem:6297669704
loading_loaded_bytes:1323923888
loading_loaded_perc:85.02
loading_eta_seconds:8

# Stats
total_connections_received:0
total_commands_processed:0
instantaneous_ops_per_sec:0
total_net_input_bytes:0
total_net_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:0
total_writes_processed:0
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:da05fc9f64766aab828557874b3af5322a2e6522
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:2.662101
used_cpu_user:21.433535
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:2.606026
used_cpu_user_main_thread:21.435373

# Modules

# Commandstats

# Errorstats

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=1619014,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
7484:M 02 Dec 2021 19:43:16.304 # Bio thread for job type #0 terminated
7484:M 02 Dec 2021 19:43:16.304 # Bio thread for job type #1 terminated
7484:M 02 Dec 2021 19:43:16.304 # Bio thread for job type #2 terminated
*** Preparing to test memory region 560dee4c2000 (2281472 bytes)
*** Preparing to test memory region 560defe09000 (135168 bytes)
*** Preparing to test memory region 7ff821a00000 (6194987008 bytes)
*** Preparing to test memory region 7ff992f15000 (402653184 bytes)
*** Preparing to test memory region 7ff9aaf16000 (8388608 bytes)
*** Preparing to test memory region 7ff9ab717000 (8388608 bytes)
*** Preparing to test memory region 7ff9abf18000 (8388608 bytes)
*** Preparing to test memory region 7ff9ac719000 (8388608 bytes)
*** Preparing to test memory region 7ff9ad200000 (8388608 bytes)
*** Preparing to test memory region 7ff9adadb000 (24576 bytes)
*** Preparing to test memory region 7ff9adca2000 (16384 bytes)
*** Preparing to test memory region 7ff9adcc4000 (16384 bytes)
*** Preparing to test memory region 7ff9ade12000 (8192 bytes)
*** Preparing to test memory region 7ff9ade55000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
7484:M 02 Dec 2021 19:43:31.680 # 

 ****** current key details ******
7484:M 02 Dec 2021 19:43:31.680 # cur key details: len: 34, clen: 317, enc: 0, type: 2
7484:M 02 Dec 2021 19:43:31.680 # cur key name: nodebbpostsearch:id:329783:indices

------ DUMPING CODE AROUND EIP ------
Symbol: siphash (base: 0x560dee398720)
Module: redis-server *:6379 (base 0x560dee2b5000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x560dee398720 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
7484:M 02 Dec 2021 19:43:31.680 # dump of function (hexdump of 224 bytes):
4889d04989f1488b124189f2488b48084983e1f84989f84183e20749bb6d6f646e61726f644901f948c1e63848b8736574796264657448bf6172656e6567796c4831c84c31d949bb75657370656d6f734831d74c31da4d39c874450f1f4400004933004801ca48c1c10d4983c0084801c748c1c0104831d14831f848c1c2204801cf4801c248c1c0154831d048c1c111493350f84831f948c1c7204d39c175c04c8d1d11040d004f6304934d01d841ffe00f1f8000000000450fb6410649c1e0304c09c6450fb6410549c1e0284c09c6450fb6410449c1e0204c09c6450fb641

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/redis/redis/issues

  Suspect RAM error? Use redis-server --test-memory to verify it.

Violación de segmento

Comment From: oranagra

@yoav-steinberg maybe you can put a second pair of eyes on this. it looks like siphash gets segfault from inside dictAddRaw (computing the hash of the new key, during RDB loading). but when i added some debug code to print more info during the crash log, about the key that causes the crash, we see the debug code is able to print the entire key name without crashing:

 ****** current key details ******
7484:M 02 Dec 2021 19:43:31.680 # cur key details: len: 34, clen: 317, enc: 0, type: 2
7484:M 02 Dec 2021 19:43:31.680 # cur key name: nodebbpostsearch:id:329783:indices

maybe the next step would be to print the sdslen and address of that key?

Comment From: yoav-steinberg

@oranagra two questions: 1) If I'm reading the debug print correctly the compressed data (clen) is larger than the uncompressed data (len). I don't see how rdbSaveLzfStringObject could have ever generated this. 2) Are we sure the problem is when loading a key? It might be that we're hashing a hash field or set item and then your debug prints aren't relevant.

@alejandrocobo is there any chance you can send over and rdb that causes this? Then I can take a look on my end.

Comment From: oranagra

@yoav-steinberg you are right, type is RDB_TYPE_SET and the crash is in rdbLoadObject, so my patch was irrelevant (need to make another one). p.s. the clen field is not relevant since enc is 0.

Comment From: yoav-steinberg

@alejandrocobo can you try the attached patch, it should (hopefully) print details about the problematic Redis Set elements in the crash report. debug_patch.diff.gz (with set prints)

Comment From: alejandrocobo

@alejandrocobo can you try the attached patch, it should (hopefully) prints details about the problematic Redis Set elements in the crash report. debug_patch.diff.gz (with set prints)

I'm away till Wednesday. I'll try this patch as soon as I get back.

@yoav-steinberg I could send you the rdb file in case that makes the debugging process much easier. Otherwise, I prefer keep doing all testings you ask me for.

Comment From: sundb

@yoav-steinberg clen may not be the compressed length of nodebbpostsearch:id:329783:indices, I have tested that it is not compressed successfully by lzf_compress. The clen may be the length of the last string that was compressed.

Comment From: yoav-steinberg

I'm away till Wednesday. I'll try this patch as soon as I get back.

Great, thanks.

@yoav-steinberg I could send you the rdb file in case that makes the debugging process much easier. Otherwise, I prefer keep doing all testings you ask me for.

No problem, lets continue with the current approach and see where it leads.

Comment From: alejandrocobo

@yoav-steinberg I get a patch: **** malformed patch at line 95: @@ -2757,6 +2775,24 @@ error: from your patch.

Comment From: yoav-steinberg

@yoav-steinberg I get a patch: **** malformed patch at line 95: @@ -2757,6 +2775,24 @@ error: from your patch.

I verified the patch works on the latest 6.2 branch (6.2.6). Is that what you're using? If so then I'll recreate it and verify again. If not then let me know what version you're using.

Comment From: alejandrocobo

I'm using 6.2.6 without any previous patch.

Comment From: yoav-steinberg

try this one debug_patch.diff.gz

Comment From: alejandrocobo

I think this is the juicy part:

 ****** current key details ******
8496:M 08 Dec 2021 14:20:44.112 # cur key details: len: 34, clen: 317, enc: 0, type: 2
8496:M 08 Dec 2021 14:20:44.112 # cur key name: nodebbpostsearch:id:329783:indices
8496:M 08 Dec 2021 14:20:44.112 # Last set details - len: 3, enc: 2, element_len 2457292352
8496:M 08 Dec 2021 14:20:44.112 # Last set element: 


If you need the full report, just tell me.

Comment From: yoav-steinberg

Thanks, great. I think I got it to recreate locally. I'll update soon.

Comment From: yoav-steinberg

This recreates when adding a value to a set element larger than 2^31. This is due to an integer overflow which I already fixed and I'll push a PR soon. @alejandrocobo I'm also interested in the use case here? Adding such a large value to a set requires Redis to check for uniqueness and also compute a hash value of the set element. This will be relatively slow. Are you really storing unique buffers of such a large size? Perhaps you could use a Redis list instead if uniqueness isn't that important or store a set of SHA values of your data instead of the data itself (or both)?

Comment From: alejandrocobo

Thank you very much, @yoav-steinberg

@alejandrocobo I'm also interested in the use case here?

This is a forum database managed by Nodebb which means I have no idea what's stored there and why they do that way. I've been using this platform for years and, as I said, Redis never gave me any issue with the 3.2.x branch.

Adding such a large value to a set requires Redis to check for uniqueness and also compute a hash value of the set element. This will be relatively slow. Are you really storing unique buffers of such a large size? Perhaps you could use a Redis list instead if uniqueness isn't that important or store a set of SHA values of your data instead of the data itself (or both)?

Unfortunately I'm not part of Nodebb development team nor I have any skills in databases so I'm afraid I cannot be helpful in that regard. Since I'm using an old version of Nodebb (I want to upgrade it as soon as this issue is solved), I don't think I should report this (or maybe I should?).

Comment From: oranagra

AFAIR, in redis 3.2 there was no possibility to create such a large set elements. it's before the proto-max-bulk-len config was introduced, so the limit was hard coded on 512mb. this leads me to suspect that there might be another bug here, that corrupted the data.

Comment From: sundb

@oranagra It's strange, why the same rdb works in 3.x, but not after the upgrade. I guess the data may not be corrupted, but the rdb may have been upgraded with an error. Maybe we can look forward to @alejandrocobo 's rdb file.

Comment From: yoav-steinberg

In redis 4 we moved to siphash, 3.x still used murmurhash. Looking at the code, the murmur hash was directly implemented inside dictGenHashFunction like this:

    while(len >= 4) {
        uint32_t k = *(uint32_t*)data;
    ...

This means that when passing an overflowed value to to the function's int len argument we got a negative len. So the loop never ran at all and an invalid but consistent hash was generated. Since v4 The new siphash function is called from within dictGenHashFunction where the negative int was passed to siphash as a huge sign extended size_t causing invalid memory access.

Comment From: oranagra

thanks. so that explains why it didn't crash on 3.2, but we still don't know how they where able to generate such large elements (before proto-max-bulk-len was introduced, and the limit was hard coded on 512mb)

Comment From: yoav-steinberg

Is there any command, possibly an EVAL script, that populates a Redis set internally. Perhaps copying a Redis string into it?

Comment From: oranagra

copying a string (key) won't work, since these are also limited to 512mb. but maybe if the EVAL script creates a lua native string, it can pass it to the SADD command successfully (the check in networking.c won't be used, and i see no check in scripting.c or in t_set.c).

so that's a possible explanation. @alejandrocobo do you know if this deployment is using EVAL / EVALSHA? (check INFO commandstats)

Comment From: yoav-steinberg

@oranagra Please note that @alejandrocobo isn't using Redis directly. He's using it via a NodeBB deployment. If we want to figure our what's going on here we need to dig into their usage of Redis or open an issue there.

Comment From: alejandrocobo

@oranagra here you have the output of INFO commandstats:

192.168.0.23:6379> INFO commandstats
# Commandstats
cmdstat_get:calls=637654,usec=9714428,usec_per_call=15.23
cmdstat_set:calls=213595,usec=5393358,usec_per_call=25.25
cmdstat_del:calls=1081,usec=16473,usec_per_call=15.24
cmdstat_exists:calls=314,usec=3432,usec_per_call=10.93
cmdstat_incr:calls=151,usec=2844,usec_per_call=18.83
cmdstat_lpush:calls=19,usec=559,usec_per_call=29.42
cmdstat_llen:calls=33,usec=342,usec_per_call=10.36
cmdstat_sadd:calls=514,usec=7843,usec_per_call=15.26
cmdstat_srem:calls=18487,usec=62124,usec_per_call=3.36
cmdstat_sismember:calls=55335,usec=284570,usec_per_call=5.14
cmdstat_scard:calls=978,usec=6106,usec_per_call=6.24
cmdstat_smembers:calls=3207531,usec=26082206,usec_per_call=8.13
cmdstat_zadd:calls=550515,usec=39760747,usec_per_call=72.22
cmdstat_zincrby:calls=2684810,usec=139186915,usec_per_call=51.84
cmdstat_zrem:calls=2262,usec=45774,usec_per_call=20.24
cmdstat_zremrangebyscore:calls=592,usec=9089,usec_per_call=15.35
cmdstat_zrange:calls=32778694,usec=219418376,usec_per_call=6.69
cmdstat_zrangebyscore:calls=2343,usec=70855,usec_per_call=30.24
cmdstat_zrevrangebyscore:calls=39286,usec=1400400,usec_per_call=35.65
cmdstat_zrangebylex:calls=29,usec=2129,usec_per_call=73.41
cmdstat_zcount:calls=78,usec=2775,usec_per_call=35.58
cmdstat_zrevrange:calls=1629155,usec=34917131,usec_per_call=21.43
cmdstat_zcard:calls=117882,usec=1933635,usec_per_call=16.40
cmdstat_zscore:calls=7205365,usec=259793044,usec_per_call=36.06
cmdstat_zrank:calls=1132520,usec=28152223,usec_per_call=24.86
cmdstat_hset:calls=340,usec=7174,usec_per_call=21.10
cmdstat_hget:calls=9921,usec=69208,usec_per_call=6.98
cmdstat_hmset:calls=1842,usec=56007,usec_per_call=30.41
cmdstat_hincrby:calls=137508,usec=3390841,usec_per_call=24.66
cmdstat_hdel:calls=78,usec=2695,usec_per_call=34.55
cmdstat_hlen:calls=6,usec=104,usec_per_call=17.33
cmdstat_hgetall:calls=42216909,usec=791107223,usec_per_call=18.74
cmdstat_hexists:calls=2568338,usec=63745026,usec_per_call=24.82
cmdstat_select:calls=5,usec=10,usec_per_call=2.00
cmdstat_expire:calls=568054,usec=9837219,usec_per_call=17.32
cmdstat_pexpire:calls=151,usec=2756,usec_per_call=18.25
cmdstat_scan:calls=243,usec=236079,usec_per_call=971.52
cmdstat_ping:calls=4,usec=11,usec_per_call=2.75
cmdstat_type:calls=121590,usec=212617,usec_per_call=1.75
cmdstat_multi:calls=476,usec=1890,usec_per_call=3.97
cmdstat_exec:calls=13050,usec=217533595,usec_per_call=16669.24
cmdstat_psync:calls=2,usec=99770,usec_per_call=49885.00
cmdstat_replconf:calls=4,usec=5,usec_per_call=1.25
cmdstat_info:calls=19,usec=1309,usec_per_call=68.89
cmdstat_ttl:calls=11,usec=89,usec_per_call=8.09
cmdstat_config:calls=2,usec=200,usec_per_call=100.00
cmdstat_subscribe:calls=1,usec=4,usec_per_call=4.00
cmdstat_psubscribe:calls=1,usec=3,usec_per_call=3.00
cmdstat_publish:calls=2634,usec=31760,usec_per_call=12.06
cmdstat_client:calls=6,usec=431,usec_per_call=71.83
cmdstat_slowlog:calls=1,usec=1142,usec_per_call=1142.00
cmdstat_command:calls=1,usec=593,usec_per_call=593.00

Comment From: sundb

If element_len is 2457292352, the string will be compressed successfully as normal, clen should not be only 317 either.

Comment From: alejandrocobo

Hi. I've tested the fix #9916 and apparently, now it works fine (I've not tested connecting NodeBB to this patched Redis instance yet but at least, it doesn't crash anymore). I'd like to use 6.0.16 or 6.2.6 plus this patch. Is it safe for my data to apply this patch or should I wait for you to run some additional tests?

Thanks.

Comment From: yoav-steinberg

@alejandrocobo It's safe to run the patch on 6.x. Note that the patch might fail because it also changes the testing framework, but as long as you only apply the changes to the files under src/ it should be fine.