Redis replica in a simple Master-Slave Configuration keeps crashing randomly. Ubuntu 22.04 LTS

=== REDIS BUG REPORT START: Cut & paste starting from here === 32214:S 15 Apr 2024 15:08:17.765 # ------------------------------------------------ 32214:S 15 Apr 2024 15:08:17.765 # !!! Software Failure. Press left mouse button to continue 32214:S 15 Apr 2024 15:08:17.765 # Guru Meditation: Replica was unable to write command to disk. #server.c:4029

------ STACK TRACE ------

Backtrace: /opt/redis-stack/bin/redis-server :6379(+0x9b96d)[0x5bcd753ab96d] /opt/redis-stack/bin/redis-server :6379(processInputBuffer+0x107)[0x5bcd753d0387] /opt/redis-stack/bin/redis-server :6379(readQueryFromClient+0x368)[0x5bcd753d0908] /opt/redis-stack/bin/redis-server :6379(+0x1c9b9c)[0x5bcd754d9b9c] /opt/redis-stack/bin/redis-server :6379(aeMain+0xf9)[0x5bcd7539fdd9] /opt/redis-stack/bin/redis-server :6379(main+0x3df)[0x5bcd7539463f] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7d7613c29d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7d7613c29e40] /opt/redis-stack/bin/redis-server *:6379(_start+0x25)[0x5bcd75394e05]

------ INFO OUTPUT ------

Server

redis_version:7.2.4 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:3a23a5ac67fe7008 redis_mode:standalone os:Linux 6.5.0-1018-azure x86_64 arch_bits:64 monotonic_clock:POSIX clock_gettime multiplexing_api:epoll atomicvar_api:c11-builtin gcc_version:11.4.0 process_id:32214 process_supervised:no run_id:18ec072b3d5b65ad8c0872f2aded3988402233b8 tcp_port:6379 server_time_usec:1713193697765465 uptime_in_seconds:38887 uptime_in_days:0 hz:10 configured_hz:10 lru_clock:1917665 executable:/opt/redis-stack/bin/redis-server config_file:/etc/redis/redis-stack-6379.conf io_threads_active:0 listener0:name=tcp,bind=,bind=-::,port=6379

Clients

connected_clients:2 cluster_connections:0 maxclients:10000 client_recent_max_input_buffer:20480 client_recent_max_output_buffer:0 blocked_clients:0 tracking_clients:0 clients_in_timeout_table:0 total_blocking_keys:0 total_blocking_keys_on_nokey:0

Memory

used_memory:12190828672 used_memory_human:11.35G used_memory_rss:12255432704 used_memory_rss_human:11.41G used_memory_peak:13477301528 used_memory_peak_human:12.55G used_memory_peak_perc:90.45% used_memory_overhead:161838492 used_memory_startup:1137208 used_memory_dataset:12028990180 used_memory_dataset_perc:98.68% allocator_allocated:12194414656 allocator_active:12205555712 allocator_resident:12293505024 total_system_memory:135005425664 total_system_memory_human:125.73G used_memory_lua:31744 used_memory_vm_eval:31744 used_memory_lua_human:31.00K used_memory_scripts_eval:0 number_of_cached_scripts:0 number_of_functions:0 number_of_libraries:0 used_memory_vm_functions:32768 used_memory_vm_total:64512 used_memory_vm_total_human:63.00K used_memory_functions:184 used_memory_scripts:184 used_memory_scripts_human:184B maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.00 allocator_frag_bytes:11141056 allocator_rss_ratio:1.01 allocator_rss_bytes:87949312 rss_overhead_ratio:1.00 rss_overhead_bytes:-38072320 mem_fragmentation_ratio:1.01 mem_fragmentation_bytes:62722616 mem_not_counted_for_evict:8 mem_replication_backlog:164036 mem_total_replication_buffers:164032 mem_clients_slaves:0 mem_clients_normal:55048 mem_cluster_links:0 mem_aof_buffer:8 mem_allocator:jemalloc-5.3.0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:5211

Persistence

loading:0 async_loading:0 current_cow_peak:0 current_cow_size:0 current_cow_size_age:0 current_fork_perc:0.00 current_save_keys_processed:0 current_save_keys_total:3171596 rdb_changes_since_last_save:0 rdb_bgsave_in_progress:0 rdb_last_save_time:1713186112 rdb_last_bgsave_status:err rdb_last_bgsave_time_sec:32 rdb_current_bgsave_time_sec:-1 rdb_saves:12 rdb_last_cow_size:9142272 rdb_last_load_keys_expired:0 rdb_last_load_keys_loaded:3171596 aof_enabled:1 aof_rewrite_in_progress:1 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:52 aof_current_rewrite_time_sec:0 aof_last_bgrewrite_status:err aof_rewrites:15 aof_rewrites_consecutive_failures:1 aof_last_write_status:ok aof_last_cow_size:9662464 module_fork_in_progress:0 module_fork_last_cow_size:0 aof_current_size:4554760198 aof_base_size:4554760198 aof_pending_rewrite:0 aof_buffer_length:0 aof_pending_bio_fsync:0 aof_delayed_fsync:0

Stats

total_connections_received:23970 total_commands_processed:27994205 instantaneous_ops_per_sec:0 total_net_input_bytes:203878587529 total_net_output_bytes:3754294534 total_net_repl_input_bytes:198573434482 total_net_repl_output_bytes:0 instantaneous_input_kbps:28145.24 instantaneous_output_kbps:0.04 instantaneous_input_repl_kbps:28145.24 instantaneous_output_repl_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:208 expired_stale_perc:0.00 expired_time_cap_reached_count:0 expire_cycle_cpu_milliseconds:4520 evicted_keys:0 evicted_clients:0 total_eviction_exceeded_time:0 current_eviction_exceeded_time:0 keyspace_hits:42417236 keyspace_misses:127 pubsub_channels:0 pubsub_patterns:0 pubsubshard_channels:0 latest_fork_usec:272648 total_forks:29 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 total_active_defrag_time:0 current_active_defrag_time:0 tracking_total_keys:0 tracking_total_items:0 tracking_total_prefixes:0 unexpected_error_replies:0 total_error_replies:84626 dump_payload_sanitizations:0 total_reads_processed:4239795 total_writes_processed:10007911 io_threaded_reads_processed:0 io_threaded_writes_processed:0 reply_buffer_shrinks:16904 reply_buffer_expands:28292 eventloop_cycles:26297715 eventloop_duration_sum:2227529806 eventloop_duration_cmd_sum:51513414 instantaneous_eventloop_cycles_per_sec:3214 instantaneous_eventloop_duration_usec:32 acl_access_denied_auth:0 acl_access_denied_cmd:0 acl_access_denied_key:0 acl_access_denied_channel:0

Replication

role:slave master_host:x.x.x.x master_port:6379 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_read_repl_offset:45067404516 slave_repl_offset:45067384753 slave_priority:100 slave_read_only:1 replica_announced:1 connected_slaves:0 master_failover_state:no-failover master_replid:PQRS master_replid2:0000000000000000000000000000000000000000 master_repl_offset:45067384753 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:104857600 repl_backlog_first_byte_offset:45067226380 repl_backlog_histlen:158374

CPU

used_cpu_sys:957.055096 used_cpu_user:1849.819429 used_cpu_sys_children:216.995087 used_cpu_user_children:2547.199367 used_cpu_sys_main_thread:932.174448 used_cpu_user_main_thread:1814.793141

Modules

module:name=bf,ver=20612,api=1,filters=0,usedby=[],using=[],options=[] module:name=search,ver=20813,api=1,filters=0,usedby=[],using=[ReJSON],options=[handle-io-errors] module:name=ReJSON,ver=20610,api=1,filters=0,usedby=[search],using=[],options=[handle-io-errors] module:name=timeseries,ver=11012,api=1,filters=0,usedby=[],using=[],options=[]

Commandstats

cmdstat_scan:calls=10,usec=24723,usec_per_call=2472.30,rejected_calls=0,failed_calls=0 cmdstat_FT._LIST:calls=1,usec=119,usec_per_call=119.00,rejected_calls=0,failed_calls=0 cmdstat_del:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0 cmdstat_publish:calls=27832351,usec=47708746,usec_per_call=1.71,rejected_calls=1,failed_calls=0 cmdstat_bgrewriteaof:calls=11,usec=15,usec_per_call=1.36,rejected_calls=0,failed_calls=0 cmdstat_module|list:calls=5,usec=29,usec_per_call=5.80,rejected_calls=0,failed_calls=0 cmdstat_info:calls=70725,usec=808012,usec_per_call=11.42,rejected_calls=0,failed_calls=0 cmdstat_replicaof:calls=2,usec=573,usec_per_call=286.50,rejected_calls=8,failed_calls=0 cmdstat_dbsize:calls=3,usec=3,usec_per_call=1.00,rejected_calls=0,failed_calls=0 cmdstat_client|setname:calls=21556,usec=34498,usec_per_call=1.60,rejected_calls=0,failed_calls=0 cmdstat_client|list:calls=5,usec=557,usec_per_call=111.40,rejected_calls=0,failed_calls=0 cmdstat_get:calls=128,usec=330,usec_per_call=2.58,rejected_calls=21043,failed_calls=0 cmdstat_psubscribe:calls=394,usec=1415,usec_per_call=3.59,rejected_calls=0,failed_calls=0 cmdstat_punsubscribe:calls=2,usec=35,usec_per_call=17.50,rejected_calls=0,failed_calls=0 cmdstat_command|docs:calls=16,usec=23126,usec_per_call=1445.38,rejected_calls=0,failed_calls=0 cmdstat_quit:calls=11,usec=11,usec_per_call=1.00,rejected_calls=0,failed_calls=0 cmdstat_select:calls=3,usec=4,usec_per_call=1.33,rejected_calls=0,failed_calls=0 cmdstat_config|get:calls=42359,usec=91124,usec_per_call=2.15,rejected_calls=0,failed_calls=0 cmdstat_type:calls=12,usec=16,usec_per_call=1.33,rejected_calls=0,failed_calls=0 cmdstat_cluster|nodes:calls=127,usec=124,usec_per_call=0.98,rejected_calls=21043,failed_calls=127 cmdstat_echo:calls=21526,usec=8250,usec_per_call=0.38,rejected_calls=0,failed_calls=0 cmdstat_bgsave:calls=11,usec=2800561,usec_per_call=254596.45,rejected_calls=0,failed_calls=0 cmdstat_ping:calls=4462,usec=9952,usec_per_call=2.23,rejected_calls=21231,failed_calls=0 cmdstat_memory|usage:calls=12,usec=104,usec_per_call=8.67,rejected_calls=0,failed_calls=0 cmdstat_ttl:calls=12,usec=8,usec_per_call=0.67,rejected_calls=0,failed_calls=0 cmdstat_subscribe:calls=383,usec=1124,usec_per_call=2.93,rejected_calls=0,failed_calls=0

Errorstats

errorstat_ERR:count=21300 errorstat_LOADING:count=63325 errorstat_READONLY:count=1

Latencystats

latency_percentiles_usec_scan:p50=2179.071,p99=4390.911,p99.9=4390.911 latency_percentiles_usec_FT._LIST:p50=119.295,p99=119.295,p99.9=119.295 latency_percentiles_usec_publish:p50=1.003,p99=18.047,p99.9=31.103 latency_percentiles_usec_bgrewriteaof:p50=1.003,p99=2.007,p99.9=2.007 latency_percentiles_usec_module|list:p50=5.023,p99=12.031,p99.9=12.031 latency_percentiles_usec_info:p50=5.023,p99=76.287,p99.9=173.055 latency_percentiles_usec_replicaof:p50=255.999,p99=319.487,p99.9=319.487 latency_percentiles_usec_dbsize:p50=1.003,p99=1.003,p99.9=1.003 latency_percentiles_usec_client|setname:p50=1.003,p99=20.095,p99.9=74.239 latency_percentiles_usec_client|list:p50=67.071,p99=214.015,p99.9=214.015 latency_percentiles_usec_get:p50=2.007,p99=19.071,p99.9=20.095 latency_percentiles_usec_psubscribe:p50=2.007,p99=26.111,p99.9=35.071 latency_percentiles_usec_punsubscribe:p50=4.015,p99=31.103,p99.9=31.103 latency_percentiles_usec_command|docs:p50=1392.639,p99=1826.815,p99.9=1826.815 latency_percentiles_usec_quit:p50=1.003,p99=2.007,p99.9=2.007 latency_percentiles_usec_select:p50=1.003,p99=2.007,p99.9=2.007 latency_percentiles_usec_config|get:p50=1.003,p99=22.015,p99.9=81.407 latency_percentiles_usec_type:p50=1.003,p99=2.007,p99.9=2.007 latency_percentiles_usec_cluster|nodes:p50=1.003,p99=2.007,p99.9=2.007 latency_percentiles_usec_echo:p50=0.001,p99=1.003,p99.9=21.119 latency_percentiles_usec_bgsave:p50=253755.391,p99=265289.727,p99.9=265289.727 latency_percentiles_usec_ping:p50=1.003,p99=22.015,p99.9=41.215 latency_percentiles_usec_memory|usage:p50=5.023,p99=36.095,p99.9=36.095 latency_percentiles_usec_ttl:p50=0.001,p99=2.007,p99.9=2.007 latency_percentiles_usec_subscribe:p50=2.007,p99=24.063,p99.9=70.143

Cluster

cluster_enabled:0

Keyspace

db0:keys=3171596,expires=1973,avg_ttl=0

------ CLIENT LIST OUTPUT ------

------ CURRENT CLIENT INFO ------ id=155259 addr=x.x.x.x:6379 laddr=10.0.5.8:59762 fd=12 name= age=0 idle=0 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=40954 qbuf-free=0 argv-mem=20 multi-mem=0 rbs=16384 rbp=45 obl=0 oll=0 omem=0 tot-mem=58276 events=r cmd=del user=(superuser) redir=-1 resp=2 lib-name= lib-ver= argc: '2' argv[0]: '"DEL"' argv[1]: '"rl:PQRS"' 32214:S 15 Apr 2024 15:08:17.767 # key 'rl:PQRS' found in DB containing the following object: 32214:S 15 Apr 2024 15:08:17.767 # Object type: 0 32214:S 15 Apr 2024 15:08:17.767 # Object encoding: 1 32214:S 15 Apr 2024 15:08:17.767 # Object refcount: 2147483647

------ MODULES INFO OUTPUT ------

search_version

search_version:2.8.13 search_redis_version:7.2.4 - oss

search_index

search_number_of_indexes:153

search_fields_statistics

search_fields_text:Text=7 search_fields_numeric:Numeric=772,Sortable=310 search_fields_tag:Tag=1585,Sortable=5

search_dialect_statistics

search_dialect_1:0 search_dialect_2:0 search_dialect_3:0 search_dialect_4:0

search_runtime_configurations

search_concurrent_mode:OFF search_enableGC:ON search_minimal_term_prefix:2 search_maximal_prefix_expansions:200 search_query_timeout_ms:500 search_timeout_policy:return search_cursor_read_size:1000 search_cursor_max_idle_time:300000 search_max_doc_table_size:1000000 search_max_search_results:1000000 search_max_aggregate_results:-1 search_search_pool_size:20 search_index_pool_size:8 search_gc_scan_size:100 search_min_phonetic_term_length:3

ReJSON_trace

ReJSON_backtrace: 0: redis_module::basic_info_command_handler 1: rejson::__info_func 2: modulesCollectInfo at /__w/redis-stack/redis-stack/redis/src/module.c:10293:9 3: logModulesInfo at /__w/redis-stack/redis-stack/redis/src/debug.c:1900:22 printCrashReport at /__w/redis-stack/redis-stack/redis/src/debug.c:2183:5 4: _serverPanic at /__w/redis-stack/redis-stack/redis/src/debug.c:1158:9 5: processCommand at /__w/redis-stack/redis-stack/redis/src/server.c:4029:17 6: processCommandAndResetClient at /__w/redis-stack/redis-stack/redis/src/networking.c:2466:9 processInputBuffer at /__w/redis-stack/redis-stack/redis/src/networking.c:2574:17 7: readQueryFromClient at /__w/redis-stack/redis-stack/redis/src/networking.c:2713:9 8: callHandler at /__w/redis-stack/redis-stack/redis/src/connhelpers.h:79:18 connSocketEventHandler at /__w/redis-stack/redis-stack/redis/src/socket.c:298:14 9: aeProcessEvents at /__w/redis-stack/redis-stack/redis/src/ae.c:436:17 aeMain at /__w/redis-stack/redis-stack/redis/src/ae.c:496:9 10: main at /__w/redis-stack/redis-stack/redis/src/server.c:7360:5 11: 12: __libc_start_main 13: _start

------ CONFIG DEBUG OUTPUT ------ lazyfree-lazy-eviction no slave-read-only yes proto-max-bulk-len 512mb io-threads-do-reads no lazyfree-lazy-server-del no activedefrag no lazyfree-lazy-expire no lazyfree-lazy-user-del no replica-read-only yes io-threads 1 lazyfree-lazy-user-flush no client-query-buffer-limit 1gb sanitize-dump-payload no repl-diskless-load disabled list-compress-depth 0 repl-diskless-sync yes

------ FAST MEMORY TEST ------ 32214:S 15 Apr 2024 15:08:17.933 # Bio worker thread #0 terminated 32214:S 15 Apr 2024 15:08:17.933 # Bio worker thread #1 terminated 32214:S 15 Apr 2024 15:08:17.934 # Bio worker thread #2 terminated *** Preparing to test memory region 5bcd75694000 (704512 bytes) *** Preparing to test memory region 5bcd76348000 (454656 bytes) *** Preparing to test memory region 7d72a4000000 (135168 bytes) *** Preparing to test memory region 7d72ac000000 (135168 bytes) *** Preparing to test memory region 7d72b0600000 (4194304 bytes) *** Preparing to test memory region 7d72b0c00000 (8388608 bytes) *** Preparing to test memory region 7d72b1400000 (2097152 bytes) *** Preparing to test memory region 7d72b17f0000 (2097152 bytes) *** Preparing to test memory region 7d72b1a00000 (8388608 bytes) *** Preparing to test memory region 7d72b2200000 (2097152 bytes) *** Preparing to test memory region 7d72b24f0000 (9437184 bytes) *** Preparing to test memory region 7d72b2e00000 (8388608 bytes) *** Preparing to test memory region 7d72b3600000 (13960740864 bytes) *** Preparing to test memory region 7d75f3900000 (433061888 bytes) *** Preparing to test memory region 7d760d6f0000 (15728640 bytes) *** Preparing to test memory region 7d760e600000 (8388608 bytes) *** Preparing to test memory region 7d760ee00000 (4194304 bytes) *** Preparing to test memory region 7d760f3fe000 (8388608 bytes) *** Preparing to test memory region 7d760fbff000 (8388608 bytes) *** Preparing to test memory region 7d7610400000 (8388608 bytes) *** Preparing to test memory region 7d7611229000 (12288 bytes) *** Preparing to test memory region 7d76113ff000 (8388608 bytes) *** Preparing to test memory region 7d7611c00000 (8388608 bytes) *** Preparing to test memory region 7d7612a27000 (28672 bytes) *** Preparing to test memory region 7d7612f00000 (13631488 bytes) *** Preparing to test memory region 7d7613e1c000 (53248 bytes) *** Preparing to test memory region 7d7614441000 (12288 bytes) *** Preparing to test memory region 7d761451e000 (4096 bytes) *** Preparing to test memory region 7d7614546000 (20480 bytes) *** Preparing to test memory region 7d76146db000 (8192 bytes) .O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

Comment From: sundb

you might gotta check why rdb saving and aof rewrite are failing, if we don't let it panic, it could lead to inconsistencies between master and replication. btw, if you realy wanna ignore this panic, you can disable config replica-ignore-disk-write-errors,

rdb_last_bgsave_status:err
aof_last_bgrewrite_status:err

Comment From: gopivalleru

@more456 Redis documentation says it only support until ubuntu 20.04. You are on ubuntu 22.04

https://redis.io/docs/latest/operate/rs/references/supported-platforms/

Comment From: sundb

@gopivalleru no, this document is for Redis enterprise. we always make Redis support all the different versions of operation systems as much as possible.

Comment From: gopivalleru

@gopivalleru no, this document is for Redis enterprise. we always make Redis support all the different versions of operation systems as much as possible.

@sundb Is there similar documentation available for the community version of Redis that outlines the supported OS? If not what are the general guidelines?

Comment From: sundb

@gopivalleru the document is in https://github.com/redis/redis#building-redis

Redis can be compiled and used on Linux, OSX, OpenBSD, NetBSD, FreeBSD. We support big endian and little endian architectures, and both 32 bit and 64 bit systems.

It may compile on Solaris derived systems (for instance SmartOS) but our support for this platform is best effort and Redis is not guaranteed to work as well as in Linux, OSX, and *BSD.

Comment From: gopivalleru

@gopivalleru the document is in https://github.com/redis/redis#building-redis

``` Redis can be compiled and used on Linux, OSX, OpenBSD, NetBSD, FreeBSD. We support big endian and little endian architectures, and both 32 bit and 64 bit systems.

It may compile on Solaris derived systems (for instance SmartOS) but our support for this platform is best effort and Redis is not guaranteed to work as well as in Linux, OSX, and *BSD. ```

@sundb So, if I download and compile the Redis source code, I can use Ubuntu 24.04 in production , correct?

Comment From: sundb

@gopivalleru the document is in redis/redis#building-redis ``` Redis can be compiled and used on Linux, OSX, OpenBSD, NetBSD, FreeBSD. We support big endian and little endian architectures, and both 32 bit and 64 bit systems.

It may compile on Solaris derived systems (for instance SmartOS) but our support for this platform is best effort and Redis is not guaranteed to work as well as in Linux, OSX, and *BSD. ```

@sundb So, if I download and compile the Redis source code, I can use Ubuntu 24.04 in production , correct?

yes.