We have a Sentinel managed HA setup of 3 Redis instances, 1 master and 2 replicas. We demoted a master to replica via Sentinel in order to failback from an item of maintenance performed earlier.

At that point, the demoted master crashed. I've attached the crash report below.

At the time it looks like a BGSAVE was happening prior to the full resync, it looks like that wasn't killed when the failover was issued.

There are compliants of not enough disk space. The disk (Kubernetes PVC) has 32GiB capacity, and the dataset hovers around 16GiB. Perhaps multiple BGSAVEs happening at one caused the disk to fill up. I am not sure if this is related to the crash.

Crash report

1:M 31 Dec 2024 08:43:03.010 * 1 changes in 900 seconds. Saving...
--
1:M 31 Dec 2024 08:43:03.221 * Background saving started by pid 6535
1:M 31 Dec 2024 08:43:05.895 * Connection with replica 100.67.62.167:6379 lost.
1:M 31 Dec 2024 08:43:06.818 * Connection with replica 100.64.93.114:6379 lost.
1:S 31 Dec 2024 08:43:38.603 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 31 Dec 2024 08:43:38.603 * Connecting to MASTER 100.67.62.167:6379
1:S 31 Dec 2024 08:43:38.604 * MASTER <-> REPLICA sync started
1:S 31 Dec 2024 08:43:38.604 * REPLICAOF 100.67.62.167:6379 enabled (user request from 'id=310935 addr=100.115.192.8:59740 laddr=100.97.51.186:6379 fd=18 name=sentinel-1890936b-cmd age=15 idle=8 flags=x db=0 sub=0 psub=0 ssub=0 multi=4 qbuf=12542 qbuf-free=0 argv-mem=4 multi-mem=181 rbs=1024 rbp=49 obl=49 oll=0 omem=0 tot-mem=16561 events=r cmd=exec user=default redir=-1 resp=2 lib-name= lib-ver=')
1:S 31 Dec 2024 08:43:38.660 * CONFIG REWRITE executed with success.
1:S 31 Dec 2024 08:43:38.695 * Non blocking connect for SYNC fired the event.
1:S 31 Dec 2024 08:43:38.696 * Master replied to PING, replication can continue...
1:S 31 Dec 2024 08:43:38.698 * Trying a partial resynchronization (request 1453f2d34e001a851787b03493a0f2b2a9cc442e:1767009368017316).
1:S 31 Dec 2024 08:43:43.083 * Full resync from master: 1e8bf9ff1a813a3f4a030e248c4665773b08c003:1767009500262430
1:S 31 Dec 2024 08:43:43.495 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 31 Dec 2024 08:45:37.165 # Write error or short write writing to the DB dump file needed for MASTER <-> REPLICA synchronization: short write
1:S 31 Dec 2024 08:45:37.166 * Reconnecting to MASTER 100.67.62.167:6379 after failure
1:S 31 Dec 2024 08:45:37.166 * MASTER <-> REPLICA sync started
1:S 31 Dec 2024 08:45:37.167 * Non blocking connect for SYNC fired the event.
1:S 31 Dec 2024 08:45:37.168 * Master replied to PING, replication can continue...
1:S 31 Dec 2024 08:45:37.169 * Trying a partial resynchronization (request 1453f2d34e001a851787b03493a0f2b2a9cc442e:1767009368017316).
6535:C 31 Dec 2024 08:45:37.190 # Write error while saving DB to the disk(rdbSaveRio): No space left on device
1:S 31 Dec 2024 08:45:37.511 # Background saving error
1:S 31 Dec 2024 08:45:37.562 * 1 changes in 900 seconds. Saving...
1:S 31 Dec 2024 08:45:37.747 * Background saving started by pid 6710
1:S 31 Dec 2024 08:45:42.745 * Full resync from master: 1e8bf9ff1a813a3f4a030e248c4665773b08c003:1767010023526103
1:S 31 Dec 2024 08:45:43.149 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
6710:C 31 Dec 2024 08:48:11.486 # Write error while saving DB to the disk(rdbSaveRio): No space left on device
1:S 31 Dec 2024 08:48:11.734 # Background saving error
1:S 31 Dec 2024 08:48:11.798 * 1 changes in 900 seconds. Saving...
1:S 31 Dec 2024 08:48:11.978 * Background saving started by pid 6846
1:S 31 Dec 2024 08:48:36.172 * Replica is about to load the RDB file received from the master, but there is a pending RDB child running. Killing process 6846 and removing its temp file to avoid any race
1:S 31 Dec 2024 08:48:36.172 * Discarding previously cached master state.
1:S 31 Dec 2024 08:48:36.172 * MASTER <-> REPLICA sync: Flushing old data
1:S 31 Dec 2024 08:48:36.172 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 31 Dec 2024 08:48:36.251 * Loading RDB produced by version 7.2.4
1:S 31 Dec 2024 08:48:36.251 * RDB age 174 seconds
1:S 31 Dec 2024 08:48:36.251 * RDB memory usage when created 14153.06 Mb
6846:signal-handler (1735634916) Received SIGUSR1 in child, exiting now.
1:S 31 Dec 2024 08:49:07.566 * Done loading RDB, keys loaded: 105294, keys expired: 0.
1:S 31 Dec 2024 08:49:07.566 * MASTER <-> REPLICA sync: Finished with success
1:S 31 Dec 2024 08:49:07.567 # Background saving terminated by signal 10
1:S 31 Dec 2024 08:49:07.567 # ------------------------------------------------
1:S 31 Dec 2024 08:49:07.567 # !!! Software Failure. Press left mouse button to continue
1:S 31 Dec 2024 08:49:07.567 # Guru Meditation: Replica was unable to write command to disk. #server.c:4029
1:S 31 Dec 2024 08:49:07.568 # key 'REDACTED' found in DB containing the following object:
1:S 31 Dec 2024 08:49:07.568 # Object type: 1
1:S 31 Dec 2024 08:49:07.568 # Object encoding: 9
1:S 31 Dec 2024 08:49:07.568 # Object refcount: 1
=== REDIS BUG REPORT START: Cut & paste starting from here ===
--
------ INFO OUTPUT ------
# Server
redis_version:7.2.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:98ad193cf411d642
redis_mode:standalone
os:Linux 5.15.113-flatcar x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:13.2.1
process_id:1
process_supervised:no
run_id:534f9c3c655976c331b5710d38d73c3576809732
tcp_port:6379
server_time_usec:1735634947567641
uptime_in_seconds:7310
uptime_in_days:0
hz:20
configured_hz:20
lru_clock:7581699
executable:/data/redis-server
config_file:/data/conf/redis.conf
io_threads_active:0
listener0:name=tcp,bind=*,bind=-::*,port=6379
# Clients
connected_clients:19
cluster_connections:0
maxclients:61000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0
# Memory
used_memory:14765253504
used_memory_human:13.75G
used_memory_rss:14939267072
used_memory_rss_human:13.91G
used_memory_peak:16863724960
used_memory_peak_human:15.71G
used_memory_peak_perc:87.56%
used_memory_overhead:12687652
used_memory_startup:3733488
used_memory_dataset:14752565852
used_memory_dataset_perc:99.94%
allocator_allocated:14767501704
allocator_active:14768439296
allocator_resident:14960087040
total_system_memory:264794718208
total_system_memory_human:246.61G
used_memory_lua:371712
used_memory_vm_eval:371712
used_memory_lua_human:363.00K
used_memory_scripts_eval:21984
number_of_cached_scripts:20
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:404480
used_memory_vm_total_human:395.00K
used_memory_functions:184
used_memory_scripts:22168
used_memory_scripts_human:21.65K
maxmemory:60129542144
maxmemory_human:56.00G
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:937592
allocator_rss_ratio:1.01
allocator_rss_bytes:191647744
rss_overhead_ratio:1.00
rss_overhead_bytes:-20819968
mem_fragmentation_ratio:1.01
mem_fragmentation_bytes:174063880
mem_not_counted_for_evict:0
mem_replication_backlog:20508
mem_total_replication_buffers:20504
mem_clients_slaves:0
mem_clients_normal:194920
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:219885
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:21482119
rdb_bgsave_in_progress:0
rdb_last_save_time:1735633682
rdb_last_bgsave_status:err
rdb_last_bgsave_time_sec:56
rdb_current_bgsave_time_sec:-1
rdb_saves:9
rdb_last_cow_size:596324352
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:105294
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:311112
total_commands_processed:629095274
instantaneous_ops_per_sec:5
total_net_input_bytes:207978780191
total_net_output_bytes:344770559593
total_net_repl_input_bytes:36620773841
total_net_repl_output_bytes:271896057659
instantaneous_input_kbps:71158.97
instantaneous_output_kbps:23.91
instantaneous_input_repl_kbps:71158.66
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:3
sync_partial_ok:1
sync_partial_err:3
expired_keys:1070070
expired_stale_perc:3.19
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:6131
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:32457454
keyspace_misses:158117290
pubsub_channels:1
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:180170
total_forks:12
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:975
dump_payload_sanitizations:0
total_reads_processed:252026710
total_writes_processed:290909101
io_threaded_reads_processed:0
io_threaded_writes_processed:245039210
reply_buffer_shrinks:1740075
reply_buffer_expands:2306267
eventloop_cycles:33498135
eventloop_duration_sum:5993216897
eventloop_duration_cmd_sum:1734672695
instantaneous_eventloop_cycles_per_sec:4446
instantaneous_eventloop_duration_usec:235
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0
# Replication
role:slave
master_host:100.67.62.167
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:1767010023565747
slave_repl_offset:1767010023526126
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:1e8bf9ff1a813a3f4a030e248c4665773b08c003
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1767010023526126
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1767010023526104
repl_backlog_histlen:23
# CPU
used_cpu_sys:3689.731854
used_cpu_user:9100.596824
used_cpu_sys_children:217.823770
used_cpu_user_children:774.121531
used_cpu_sys_main_thread:2280.990473
used_cpu_user_main_thread:2140.571517
# Modules
# Commandstats
cmdstat_auth:calls=310627,usec=771208,usec_per_call=2.48,rejected_calls=0,failed_calls=0
cmdstat_client\|unpause:calls=1,usec=7,usec_per_call=7.00,rejected_calls=0,failed_calls=0
cmdstat_client\|setinfo:calls=8,usec=2,usec_per_call=0.25,rejected_calls=0,failed_calls=0
cmdstat_client\|kill:calls=4,usec=35739,usec_per_call=8934.75,rejected_calls=0,failed_calls=0
cmdstat_client\|pause:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_client\|setname:calls=1502,usec=2114,usec_per_call=1.41,rejected_calls=0,failed_calls=0
cmdstat_zadd:calls=4194537,usec=56826512,usec_per_call=13.55,rejected_calls=0,failed_calls=0
cmdstat_zcard:calls=32,usec=92,usec_per_call=2.88,rejected_calls=6,failed_calls=0
cmdstat_select:calls=2,usec=0,usec_per_call=0.00,rejected_calls=0,failed_calls=0
cmdstat_hmset:calls=848894,usec=5294883,usec_per_call=6.24,rejected_calls=0,failed_calls=0
cmdstat_script\|load:calls=5283,usec=20334,usec_per_call=3.85,rejected_calls=0,failed_calls=0
cmdstat_get:calls=2914812,usec=3999156,usec_per_call=1.37,rejected_calls=3,failed_calls=0
cmdstat_srem:calls=77263658,usec=31706044,usec_per_call=0.41,rejected_calls=0,failed_calls=0
cmdstat_zscan:calls=55,usec=201,usec_per_call=3.65,rejected_calls=0,failed_calls=0
cmdstat_exists:calls=1959202,usec=2678955,usec_per_call=1.37,rejected_calls=0,failed_calls=0
cmdstat_type:calls=5287,usec=6495,usec_per_call=1.23,rejected_calls=0,failed_calls=0
cmdstat_hincrby:calls=1091298,usec=1577364,usec_per_call=1.45,rejected_calls=0,failed_calls=0
cmdstat_hsetnx:calls=1353,usec=4187,usec_per_call=3.09,rejected_calls=0,failed_calls=0
cmdstat_incrby:calls=4778826,usec=4162112,usec_per_call=0.87,rejected_calls=0,failed_calls=0
cmdstat_config\|get:calls=2372,usec=296071,usec_per_call=124.82,rejected_calls=0,failed_calls=0
cmdstat_config\|set:calls=19,usec=106,usec_per_call=5.58,rejected_calls=0,failed_calls=0
cmdstat_config\|rewrite:calls=3,usec=60750,usec_per_call=20250.00,rejected_calls=0,failed_calls=0
cmdstat_ttl:calls=9178185,usec=3960499,usec_per_call=0.43,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=25660051,usec=111945496,usec_per_call=4.36,rejected_calls=0,failed_calls=0
cmdstat_sadd:calls=24329744,usec=30839967,usec_per_call=1.27,rejected_calls=0,failed_calls=0
cmdstat_wait:calls=234,usec=568,usec_per_call=2.43,rejected_calls=0,failed_calls=0
cmdstat_llen:calls=165811442,usec=63758581,usec_per_call=0.38,rejected_calls=0,failed_calls=0
cmdstat_publish:calls=396190,usec=5174814,usec_per_call=13.06,rejected_calls=0,failed_calls=0
cmdstat_zcount:calls=117,usec=1764,usec_per_call=15.08,rejected_calls=27,failed_calls=0
cmdstat_hscan:calls=122318,usec=774287,usec_per_call=6.33,rejected_calls=0,failed_calls=0
cmdstat_pexpireat:calls=134770,usec=76581,usec_per_call=0.57,rejected_calls=0,failed_calls=0
cmdstat_hgetall:calls=4592,usec=30636,usec_per_call=6.67,rejected_calls=0,failed_calls=0
cmdstat_hdel:calls=1113254,usec=1758574,usec_per_call=1.58,rejected_calls=0,failed_calls=0
cmdstat_lindex:calls=22600,usec=75606,usec_per_call=3.35,rejected_calls=0,failed_calls=0
cmdstat_del:calls=199377,usec=339603,usec_per_call=1.70,rejected_calls=0,failed_calls=0
cmdstat_memory\|usage:calls=9,usec=9945931,usec_per_call=1105103.50,rejected_calls=0,failed_calls=0
cmdstat_role:calls=307589,usec=1043536,usec_per_call=3.39,rejected_calls=0,failed_calls=0
cmdstat_scan:calls=85569,usec=1221289,usec_per_call=14.27,rejected_calls=0,failed_calls=0
cmdstat_sscan:calls=553155,usec=71052488,usec_per_call=128.45,rejected_calls=2,failed_calls=0
cmdstat_subscribe:calls=23,usec=75,usec_per_call=3.26,rejected_calls=0,failed_calls=0
cmdstat_setex:calls=7941,usec=26866,usec_per_call=3.38,rejected_calls=0,failed_calls=0
cmdstat_slowlog\|reset:calls=731,usec=135,usec_per_call=0.18,rejected_calls=0,failed_calls=0
cmdstat_slowlog\|len:calls=1461,usec=1473,usec_per_call=1.01,rejected_calls=0,failed_calls=0
cmdstat_slowlog\|get:calls=2646,usec=1965,usec_per_call=0.74,rejected_calls=0,failed_calls=0
cmdstat_slaveof:calls=2,usec=701,usec_per_call=350.50,rejected_calls=0,failed_calls=0
cmdstat_expire:calls=7222242,usec=8407966,usec_per_call=1.16,rejected_calls=0,failed_calls=0
cmdstat_psubscribe:calls=5703,usec=15021,usec_per_call=2.63,rejected_calls=0,failed_calls=0
cmdstat_smembers:calls=26,usec=9840,usec_per_call=378.46,rejected_calls=3,failed_calls=0
cmdstat_evalsha:calls=89809483,usec=1162768150,usec_per_call=12.95,rejected_calls=0,failed_calls=11
cmdstat_scard:calls=2987275,usec=2207378,usec_per_call=0.74,rejected_calls=0,failed_calls=0
cmdstat_lrange:calls=989081,usec=1358623,usec_per_call=1.37,rejected_calls=0,failed_calls=0
cmdstat_hget:calls=4184593,usec=11085226,usec_per_call=2.65,rejected_calls=0,failed_calls=0
cmdstat_rpop:calls=3087636,usec=4386150,usec_per_call=1.42,rejected_calls=0,failed_calls=0
cmdstat_lpop:calls=4246,usec=39327,usec_per_call=9.26,rejected_calls=0,failed_calls=0
cmdstat_unlink:calls=534936,usec=1848768,usec_per_call=3.46,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=21395,usec=13739,usec_per_call=0.64,rejected_calls=923,failed_calls=0
cmdstat_time:calls=8413470,usec=3211552,usec_per_call=0.38,rejected_calls=0,failed_calls=0
cmdstat_info:calls=219587,usec=6330242,usec_per_call=28.83,rejected_calls=0,failed_calls=0
cmdstat_zrem:calls=96090,usec=2861987,usec_per_call=29.78,rejected_calls=0,failed_calls=0
cmdstat_set:calls=28817193,usec=41455729,usec_per_call=1.44,rejected_calls=0,failed_calls=0
cmdstat_rpoplpush:calls=133527530,usec=223354081,usec_per_call=1.67,rejected_calls=0,failed_calls=0
cmdstat_hset:calls=746347,usec=1614992,usec_per_call=2.16,rejected_calls=0,failed_calls=0
cmdstat_hmget:calls=6179,usec=18803,usec_per_call=3.04,rejected_calls=0,failed_calls=0
cmdstat_eval:calls=10,usec=3320,usec_per_call=332.00,rejected_calls=0,failed_calls=0
cmdstat_lrem:calls=15833325,usec=83210046,usec_per_call=5.26,rejected_calls=0,failed_calls=0
cmdstat_spop:calls=157739,usec=23002391,usec_per_call=145.83,rejected_calls=0,failed_calls=0
cmdstat_exec:calls=4470957,usec=40417846,usec_per_call=9.04,rejected_calls=0,failed_calls=0
cmdstat_rpush:calls=830,usec=2192,usec_per_call=2.64,rejected_calls=0,failed_calls=0
cmdstat_hlen:calls=369611,usec=55839,usec_per_call=0.15,rejected_calls=0,failed_calls=0
cmdstat_brpop:calls=1016,usec=6925,usec_per_call=6.82,rejected_calls=0,failed_calls=0
cmdstat_setnx:calls=2718,usec=4614,usec_per_call=1.70,rejected_calls=0,failed_calls=0
cmdstat_replconf:calls=12506,usec=9426,usec_per_call=0.75,rejected_calls=0,failed_calls=0
cmdstat_psync:calls=4,usec=199,usec_per_call=49.75,rejected_calls=0,failed_calls=0
cmdstat_zremrangebyscore:calls=323850,usec=11396303,usec_per_call=35.19,rejected_calls=0,failed_calls=0
cmdstat_latency\|histogram:calls=1461,usec=1807141,usec_per_call=1236.92,rejected_calls=0,failed_calls=0
cmdstat_latency\|latest:calls=1461,usec=1748,usec_per_call=1.20,rejected_calls=0,failed_calls=0
cmdstat_zrangebyscore:calls=1465286,usec=15569690,usec_per_call=10.63,rejected_calls=0,failed_calls=0
cmdstat_multi:calls=4471712,usec=1849515,usec_per_call=0.41,rejected_calls=0,failed_calls=0
# Errorstats
errorstat_LOADING:count=361
errorstat_MISCONF:count=599
errorstat_NOAUTH:count=4
errorstat_NOSCRIPT:count=11
# Latencystats
latency_percentiles_usec_auth:p50=2.007,p99=9.023,p99.9=25.087
latency_percentiles_usec_client\|unpause:p50=7.007,p99=7.007,p99.9=7.007
latency_percentiles_usec_client\|setinfo:p50=0.001,p99=1.003,p99.9=1.003
latency_percentiles_usec_client\|kill:p50=261.119,p99=20840.447,p99.9=20840.447
latency_percentiles_usec_client\|pause:p50=2.007,p99=2.007,p99.9=2.007
latency_percentiles_usec_client\|setname:p50=1.003,p99=3.007,p99.9=14.015
latency_percentiles_usec_zadd:p50=12.031,p99=41.215,p99.9=71.167
latency_percentiles_usec_zcard:p50=1.003,p99=31.103,p99.9=31.103
latency_percentiles_usec_select:p50=0.001,p99=0.001,p99.9=0.001
latency_percentiles_usec_hmset:p50=5.023,p99=22.015,p99.9=42.239
latency_percentiles_usec_script\|load:p50=3.007,p99=13.055,p99.9=46.079
latency_percentiles_usec_get:p50=1.003,p99=5.023,p99.9=20.095
latency_percentiles_usec_srem:p50=0.001,p99=2.007,p99.9=4.015
latency_percentiles_usec_zscan:p50=1.003,p99=28.031,p99.9=28.031
latency_percentiles_usec_exists:p50=1.003,p99=5.023,p99.9=20.095
latency_percentiles_usec_type:p50=1.003,p99=3.007,p99.9=11.007
latency_percentiles_usec_hincrby:p50=1.003,p99=5.023,p99.9=20.095
latency_percentiles_usec_hsetnx:p50=3.007,p99=9.023,p99.9=29.055
latency_percentiles_usec_incrby:p50=1.003,p99=2.007,p99.9=16.063
latency_percentiles_usec_config\|get:p50=174.079,p99=288.767,p99.9=452.607
latency_percentiles_usec_config\|set:p50=3.007,p99=30.079,p99.9=30.079
latency_percentiles_usec_config\|rewrite:p50=2506.751,p99=56098.815,p99.9=56098.815
latency_percentiles_usec_ttl:p50=0.001,p99=1.003,p99.9=6.015
latency_percentiles_usec_lpush:p50=3.007,p99=25.087,p99.9=146.431
latency_percentiles_usec_sadd:p50=1.003,p99=6.015,p99.9=43.007
latency_percentiles_usec_wait:p50=2.007,p99=10.047,p99.9=14.015
latency_percentiles_usec_llen:p50=0.001,p99=2.007,p99.9=4.015
latency_percentiles_usec_publish:p50=1.003,p99=651.263,p99.9=1064.959
latency_percentiles_usec_zcount:p50=14.015,p99=44.031,p99.9=45.055
latency_percentiles_usec_hscan:p50=6.015,p99=21.119,p99.9=37.119
latency_percentiles_usec_pexpireat:p50=1.003,p99=2.007,p99.9=7.007
latency_percentiles_usec_hgetall:p50=7.007,p99=24.063,p99.9=52.223
latency_percentiles_usec_hdel:p50=2.007,p99=4.015,p99.9=21.119
latency_percentiles_usec_lindex:p50=2.007,p99=30.079,p99.9=74.239
latency_percentiles_usec_del:p50=2.007,p99=4.015,p99.9=21.119
latency_percentiles_usec_memory\|usage:p50=1002438.655,p99=1002438.655,p99.9=1002438.655
latency_percentiles_usec_role:p50=3.007,p99=16.063,p99.9=37.119
latency_percentiles_usec_scan:p50=11.007,p99=33.023,p99.9=692.223
latency_percentiles_usec_sscan:p50=8.031,p99=684.031,p99.9=856.063
latency_percentiles_usec_subscribe:p50=3.007,p99=8.031,p99.9=8.031
latency_percentiles_usec_setex:p50=3.007,p99=14.015,p99.9=31.103
latency_percentiles_usec_slowlog\|reset:p50=0.001,p99=2.007,p99.9=4.015
latency_percentiles_usec_slowlog\|len:p50=1.003,p99=3.007,p99.9=22.015
latency_percentiles_usec_slowlog\|get:p50=1.003,p99=5.023,p99.9=20.095
latency_percentiles_usec_slaveof:p50=171.007,p99=532.479,p99.9=532.479
latency_percentiles_usec_expire:p50=1.003,p99=3.007,p99.9=18.047
latency_percentiles_usec_psubscribe:p50=2.007,p99=6.015,p99.9=23.039
latency_percentiles_usec_smembers:p50=94.207,p99=2981.887,p99.9=2981.887
latency_percentiles_usec_evalsha:p50=4.015,p99=113.151,p99.9=1064.959
latency_percentiles_usec_scard:p50=1.003,p99=3.007,p99.9=15.039
latency_percentiles_usec_lrange:p50=1.003,p99=9.023,p99.9=32.127
latency_percentiles_usec_hget:p50=3.007,p99=8.031,p99.9=23.039
latency_percentiles_usec_rpop:p50=1.003,p99=6.015,p99.9=21.119
latency_percentiles_usec_lpop:p50=6.015,p99=44.031,p99.9=107.007
latency_percentiles_usec_unlink:p50=3.007,p99=22.015,p99.9=41.215
latency_percentiles_usec_ping:p50=1.003,p99=3.007,p99.9=11.007
latency_percentiles_usec_time:p50=0.001,p99=1.003,p99.9=1.003
latency_percentiles_usec_info:p50=22.015,p99=109.055,p99.9=684.031
latency_percentiles_usec_zrem:p50=4.015,p99=489.471,p99.9=823.295
latency_percentiles_usec_set:p50=1.003,p99=5.023,p99.9=20.095
latency_percentiles_usec_rpoplpush:p50=1.003,p99=15.039,p99.9=38.143
latency_percentiles_usec_hset:p50=2.007,p99=5.023,p99.9=22.015
latency_percentiles_usec_hmget:p50=3.007,p99=12.031,p99.9=32.127
latency_percentiles_usec_eval:p50=108.031,p99=2277.375,p99.9=2277.375
latency_percentiles_usec_lrem:p50=4.015,p99=30.079,p99.9=70.143
latency_percentiles_usec_spop:p50=107.007,p99=765.951,p99.9=2981.887
latency_percentiles_usec_exec:p50=7.007,p99=36.095,p99.9=119.295
latency_percentiles_usec_rpush:p50=2.007,p99=12.031,p99.9=28.031
latency_percentiles_usec_hlen:p50=0.001,p99=1.003,p99.9=2.007
latency_percentiles_usec_brpop:p50=6.015,p99=26.111,p99.9=39.167
latency_percentiles_usec_setnx:p50=2.007,p99=5.023,p99.9=17.023
latency_percentiles_usec_replconf:p50=1.003,p99=1.003,p99.9=15.039
latency_percentiles_usec_psync:p50=34.047,p99=94.207,p99.9=94.207
latency_percentiles_usec_zremrangebyscore:p50=22.015,p99=182.271,p99.9=387.071
latency_percentiles_usec_latency\|histogram:p50=1245.183,p99=1753.087,p99.9=2408.447
latency_percentiles_usec_latency\|latest:p50=1.003,p99=2.007,p99.9=6.015
latency_percentiles_usec_zrangebyscore:p50=2.007,p99=134.143,p99.9=228.351
latency_percentiles_usec_multi:p50=0.001,p99=2.007,p99.9=9.023
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=105294,expires=100319,avg_ttl=0
------ CLIENT LIST OUTPUT ------
id=310995 addr=100.112.25.210:58432 laddr=100.97.51.186:6379 fd=25 name= age=258 idle=14 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=49 obl=0 oll=0 omem=0 tot-mem=22400 events=r cmd=sscan user=default redir=-1 resp=2 lib-name= lib-ver=
id=311087 addr=[::1]:55108 laddr=[::1]:6379 fd=17 name= age=64 idle=64 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1928 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=311083 addr=[::1]:35822 laddr=[::1]:6379 fd=20 name= age=74 idle=74 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1928 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=311091 addr=[::1]:58588 laddr=[::1]:6379 fd=23 name= age=54 idle=54 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1928 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=311101 addr=[::1]:58550 laddr=[::1]:6379 fd=24 name= age=34 idle=34 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22400 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=310935 addr=100.115.192.8:59740 laddr=100.97.51.186:6379 fd=18 name=sentinel-1890936b-cmd age=344 idle=1 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=1024 obl=0 oll=0 omem=0 tot-mem=22400 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver=
id=310953 addr=100.96.130.73:39860 laddr=100.97.51.186:6379 fd=8 name=sentinel-10543d49-cmd age=329 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=6712 obl=0 oll=0 omem=0 tot-mem=37760 events=r cmd=publish user=default redir=-1 resp=2 lib-name= lib-ver=
id=310964 addr=172.28.27.23:60350 laddr=100.97.51.186:6379 fd=14 name= age=315 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=343 obl=0 oll=0 omem=0 tot-mem=22400 events=r cmd=ping user=default redir=-1 resp=2 lib-name=redis-py lib-ver=5.0.7
id=310954 addr=100.115.192.8:44110 laddr=100.97.51.186:6379 fd=9 name=sentinel-1890936b-pubsub age=329 idle=0 flags=P db=0 sub=1 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=148 obl=0 oll=0 omem=0 tot-mem=22456 events=r cmd=subscribe user=default redir=-1 resp=2 lib-name= lib-ver=
id=311077 addr=[::1]:60564 laddr=[::1]:6379 fd=15 name= age=84 idle=84 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1928 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=311105 addr=[::1]:38928 laddr=[::1]:6379 fd=28 name= age=24 idle=24 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37760 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=311111 addr=[::1]:47376 laddr=[::1]:6379 fd=29 name= age=14 idle=14 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37760 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=311115 addr=[::1]:51040 laddr=[::1]:6379 fd=30 name= age=4 idle=4 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37760 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=311118 addr=100.67.62.167:6379 laddr=100.97.51.186:41822 fd=13 name= age=0 idle=0 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=39644 qbuf-free=1310 argv-mem=6956 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=65228 events=r cmd=lrem user=(superuser) redir=-1 resp=2 lib-name= lib-ver=
id=310955 addr=172.28.27.23:1909 laddr=100.97.51.186:6379 fd=10 name=sentinel-811ab38d-cmd age=328 idle=1 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=8192 rbp=0 obl=0 oll=0 omem=0 tot-mem=29568 events=r cmd=publish user=default redir=-1 resp=2 lib-name= lib-ver=
id=310956 addr=172.28.27.23:48310 laddr=100.97.51.186:6379 fd=11 name=sentinel-811ab38d-pubsub age=328 idle=0 flags=P db=0 sub=1 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=147 obl=0 oll=0 omem=0 tot-mem=22456 events=r cmd=subscribe user=default redir=-1 resp=2 lib-name= lib-ver=
id=310957 addr=100.96.130.73:39868 laddr=100.97.51.186:6379 fd=12 name=sentinel-10543d49-pubsub age=328 idle=0 flags=P db=0 sub=1 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=147 obl=0 oll=0 omem=0 tot-mem=22456 events=r cmd=subscribe user=default redir=-1 resp=2 lib-name= lib-ver=
id=311097 addr=[::1]:52628 laddr=[::1]:6379 fd=26 name= age=44 idle=44 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1928 events=r cmd=slowlog\|reset user=default redir=-1 resp=2 lib-name= lib-ver=
id=310994 addr=100.112.25.210:58434 laddr=100.97.51.186:6379 fd=19 name= age=258 idle=14 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1928 events=r cmd=sscan user=default redir=-1 resp=2 lib-name= lib-ver=
------ CURRENT CLIENT INFO ------
id=311118 addr=100.67.62.167:6379 laddr=100.97.51.186:41822 fd=13 name= age=0 idle=0 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=39644 qbuf-free=1310 argv-mem=6956 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=65228 events=r cmd=lrem user=(superuser) redir=-1 resp=2 lib-name= lib-ver=
argc: '4'
argv[0]: '"lrem"'
argv[1]: 'REDACTED'
argv[2]: '"-1"'
argv[3]: 'REDACTED'
------ MODULES INFO OUTPUT ------
------ CONFIG DEBUG OUTPUT ------
slave-read-only yes
lazyfree-lazy-user-flush no
lazyfree-lazy-eviction no
repl-diskless-sync yes
list-compress-depth 0
proto-max-bulk-len 512mb
activedefrag no
io-threads-do-reads no
sanitize-dump-payload no
io-threads 3
replica-read-only yes
lazyfree-lazy-expire yes
repl-diskless-load disabled
client-query-buffer-limit 1gb
lazyfree-lazy-user-del yes
lazyfree-lazy-server-del yes
------ FAST MEMORY TEST ------
1:S 31 Dec 2024 08:49:08.690 # Bio worker thread #0 terminated
1:S 31 Dec 2024 08:49:08.690 # Bio worker thread #1 terminated
1:S 31 Dec 2024 08:49:08.691 # Bio worker thread #2 terminated

Additional information

  1. OS distribution and version
  2. Linux 5.15.113-flatcar x86_64
  3. Steps to reproduce (if any)
  4. Unable to reproduce

Comment From: sundb

There are compliants of not enough disk space. The disk (Kubernetes PVC) has 32GiB capacity, and the dataset hovers around 16GiB. Perhaps multiple BGSAVEs happening at one caused the disk to fill up. I am not sure if this is related to the crash.

yes, the reason is it, if you dont want to panic, you can use config replica-ignore-disk-write-errors no

Comment From: jdheyburn

Thanks @sundb, would it make to abort the BGSAVEs on failover because they would be dumping expired data? That seems to be the cause of the disk filling up.

Comment From: sundb

Thanks @sundb, would it make to abort the BGSAVEs on failover because they would be dumping expired data? That seems to be the cause of the disk filling up.

no, we will skip the expired data.