Hi,
I'm using Redis docker container on Ubuntu 20.04, and since a few days, this one crashed. Since, each time I want to start it again, it stops a few hours after.
So I checked the logs, and here it is :
=== REDIS BUG REPORT START: Cut & paste starting from here ===
1:S 02 Feb 2023 08:28:06.402 # === ASSERTION FAILED ===
1:S 02 Feb 2023 08:28:06.402 # ==> blocked.c:624 'server.also_propagate.numops == 0' is not true
------ INFO OUTPUT ------
# Server
redis_version:7.0.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:940e2f477fae3a39
redis_mode:standalone
os:Linux 5.4.0-65-generic x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:11.2.1
process_id:1
process_supervised:no
run_id:41253934f5ea04f6bfca306e9608d382601053d9
tcp_port:6379
server_time_usec:1675326486402356
uptime_in_seconds:23831
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:14382102
executable:/data/redis-server
config_file:
io_threads_active:0
# Clients
connected_clients:2
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
# Memory
used_memory:1271432
used_memory_human:1.21M
used_memory_rss:7733248
used_memory_rss_human:7.38M
used_memory_peak:1421000
used_memory_peak_human:1.36M
used_memory_peak_perc:89.47%
used_memory_overhead:908536
used_memory_startup:862888
used_memory_dataset:362896
used_memory_dataset_perc:88.83%
allocator_allocated:1534472
allocator_active:1953792
allocator_resident:5414912
total_system_memory:16394264576
total_system_memory_human:15.27G
used_memory_lua:36864
used_memory_vm_eval:36864
used_memory_lua_human:36.00K
used_memory_scripts_eval:376
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:69632
used_memory_vm_total_human:68.00K
used_memory_functions:184
used_memory_scripts:560
used_memory_scripts_human:560B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.27
allocator_frag_bytes:419320
allocator_rss_ratio:2.77
allocator_rss_bytes:3461120
rss_overhead_ratio:1.43
rss_overhead_bytes:2318336
mem_fragmentation_ratio:6.09
mem_fragmentation_bytes:6463936
mem_not_counted_for_evict:8
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:44544
mem_cluster_links:0
mem_aof_buffer:8
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:9
rdb_bgsave_in_progress:0
rdb_last_save_time:1675326382
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:22
rdb_last_cow_size:565248
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:4
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:4353966
aof_base_size:427
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
# Stats
total_connections_received:107
total_commands_processed:48187
instantaneous_ops_per_sec:1
total_net_input_bytes:5536668
total_net_output_bytes:1471894
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.20
instantaneous_output_kbps:0.01
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:13
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:445
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:170
keyspace_misses:47762
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:684
total_forks:2
migrate_cached_sockets:0
slave_expires_tracked_keys:2
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:36
dump_payload_sanitizations:0
total_reads_processed:48295
total_writes_processed:48192
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:17
reply_buffer_expands:0
# Replication
role:slave
master_host:194.40.243.206
master_port:8886
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_read_repl_offset:0
slave_repl_offset:0
master_link_down_since_seconds:-1
slave_priority:100
slave_read_only:0
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:3bd042a5e9bf4f1c7c10282bcc58e33ad5551bac
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:32.260467
used_cpu_user:42.243077
used_cpu_sys_children:0.003878
used_cpu_user_children:0.005598
used_cpu_sys_main_thread:32.241295
used_cpu_user_main_thread:42.237038
# Modules
# Commandstats
cmdstat_get:calls=47932,usec=278274,usec_per_call=5.81,rejected_calls=0,failed_calls=0
cmdstat_setex:calls=18,usec=289,usec_per_call=16.06,rejected_calls=0,failed_calls=0
cmdstat_set:calls=54,usec=492,usec_per_call=9.11,rejected_calls=0,failed_calls=0
cmdstat_flushdb:calls=6,usec=153,usec_per_call=25.50,rejected_calls=0,failed_calls=0
cmdstat_del:calls=6,usec=23,usec_per_call=3.83,rejected_calls=0,failed_calls=0
cmdstat_select:calls=94,usec=298,usec_per_call=3.17,rejected_calls=0,failed_calls=0
cmdstat_command:calls=4,usec=15139,usec_per_call=3784.75,rejected_calls=0,failed_calls=0
cmdstat_flushall:calls=8,usec=26561,usec_per_call=3320.12,rejected_calls=0,failed_calls=0
cmdstat_info:calls=5,usec=1387,usec_per_call=277.40,rejected_calls=0,failed_calls=0
cmdstat_eval:calls=1,usec=209,usec_per_call=209.00,rejected_calls=0,failed_calls=1
cmdstat_config|set:calls=37,usec=734,usec_per_call=19.84,rejected_calls=0,failed_calls=34
cmdstat_slaveof:calls=2,usec=495,usec_per_call=247.50,rejected_calls=0,failed_calls=0
cmdstat_save:calls=20,usec=65566,usec_per_call=3278.30,rejected_calls=0,failed_calls=0
# Errorstats
errorstat_ERR:count=36
# Latencystats
latency_percentiles_usec_get:p50=5.023,p99=22.015,p99.9=57.087
latency_percentiles_usec_setex:p50=13.055,p99=32.127,p99.9=32.127
latency_percentiles_usec_set:p50=8.031,p99=16.063,p99.9=31.103
latency_percentiles_usec_flushdb:p50=26.111,p99=31.103,p99.9=31.103
latency_percentiles_usec_del:p50=3.007,p99=7.007,p99.9=7.007
latency_percentiles_usec_select:p50=3.007,p99=14.015,p99.9=19.071
latency_percentiles_usec_command:p50=4095.999,p99=4587.519,p99.9=4587.519
latency_percentiles_usec_flushall:p50=3325.951,p99=4390.911,p99.9=4390.911
latency_percentiles_usec_info:p50=250.879,p99=385.023,p99.9=385.023
latency_percentiles_usec_eval:p50=209.919,p99=209.919,p99.9=209.919
latency_percentiles_usec_config|set:p50=17.023,p99=100.351,p99.9=100.351
latency_percentiles_usec_slaveof:p50=8.031,p99=487.423,p99.9=487.423
latency_percentiles_usec_save:p50=3047.423,p99=5406.719,p99.9=5406.719
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=4,expires=0,avg_ttl=0
db1:keys=5,expires=2,avg_ttl=0
------ CLIENT LIST OUTPUT ------
id=5 addr=10.5.0.2:36302 laddr=10.5.0.3:6379 fd=9 name= age=23830 idle=1 flags=N db=1 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=22272 events=r cmd=get user=default redir=-1 resp=2
id=6 addr=10.5.0.2:36304 laddr=10.5.0.3:6379 fd=10 name= age=23830 idle=1 flags=N db=1 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=22272 events=r cmd=get user=default redir=-1 resp=2
------ MODULES INFO OUTPUT ------
------ CONFIG DEBUG OUTPUT ------
lazyfree-lazy-expire no
activedefrag no
repl-diskless-sync yes
proto-max-bulk-len 512mb
lazyfree-lazy-user-del no
replica-read-only no
client-query-buffer-limit 1gb
lazyfree-lazy-user-flush no
lazyfree-lazy-server-del no
sanitize-dump-payload no
lazyfree-lazy-eviction no
io-threads 1
slave-read-only no
io-threads-do-reads no
repl-diskless-load disabled
list-compress-depth 0
------ FAST MEMORY TEST ------
1:S 02 Feb 2023 08:28:06.403 # Bio thread for job type #0 terminated
1:S 02 Feb 2023 08:28:06.403 # Bio thread for job type #1 terminated
1:S 02 Feb 2023 08:28:06.404 # Bio thread for job type #2 terminated
*** Preparing to test memory region 561a1894f000 (2306048 bytes)
*** Preparing to test memory region 561a1a670000 (12288 bytes)
*** Preparing to test memory region 7f6bc8800000 (2097152 bytes)
*** Preparing to test memory region 7f6bc8b74000 (2621440 bytes)
*** Preparing to test memory region 7f6bc9a00000 (8388608 bytes)
*** Preparing to test memory region 7f6bca2bb000 (4096 bytes)
*** Preparing to test memory region 7f6bca2be000 (139264 bytes)
*** Preparing to test memory region 7f6bca2e2000 (368640 bytes)
*** Preparing to test memory region 7f6bca5ba000 (16384 bytes)
*** Preparing to test memory region 7f6bca6d4000 (12288 bytes)
.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
And here are the lines before the crash : (this 3 lines were repeated many times)
1:S 02 Feb 2023 08:28:06.301 * Connecting to MASTER 194.40.243.206:8886
1:S 02 Feb 2023 08:28:06.302 * MASTER <-> REPLICA sync started
1:S 02 Feb 2023 08:28:06.343 # Error condition on socket for SYNC: Connection refused
Thank's for your time !
Comment From: hpatro
@DarkHiito
- Do you understand why the replication link is broken with the master?
- Would you be able to share the set of commands leading to this crash?
Comment From: DarkHiito
Hi,
No i don't really understand, the only thing I did is to pull the redis docker container, and run it. I don't need replication, I just want Redis to be fast to manage my cache.
I didn't run commands to get this crash, I just started the container and that's it
Comment From: hpatro
@DarkHiito Not sure how the Redis server was initialized, would suggest to run SLAVEOF NO ONE to make the server a MASTER. Add password to the default user to avoid incorrect access.
There is some data stored on the Redis server based on the keyspace metrics, hence some command was ran or rdb file was used to load the data from.
db0:keys=4,expires=0,avg_ttl=0
db1:keys=5,expires=2,avg_ttl=0
Comment From: judeng
This crash only happend in a writeable replica which aof was be enabled at the same time. It have been fixed in #11615 ; please update the redis to latest version. @guybe7 hi, do you think we need to add a test case for this edge case?
Comment From: guybe7
thanks, @judeng it is always a good idea to add more tests, but unfortunately, I don't have any spare time at the moment - do you want to add a test?
if so, we should also update the comment in expireSlaveKeys:
/* Propagate the DEL (writable replicas do not propagate anything to other replicas, but they might propagate to AOF) and trigger module hooks. */
Comment From: guybe7
@DarkHiito please use 7.0.8 and you should be good
closing the issue
Comment From: DarkHiito
@DarkHiito please use 7.0.8 and you should be good
closing the issue
@guybe7 Hi, thank's for your answer, I upgraded to 7.0.8 and i got the exact same error :/
Comment From: judeng
@guybe7 Hi, thank's for your answer, I upgraded to 7.0.8 and i got the exact same error :/
I'm sorry for not verifying carefully, #11615 is indeed not included in 7.0.8 :-( it's still wait to backport to 7.0, can you modify the source code in 7.0.8 and compile it by yourself?
Comment From: guybe7
@oranagra should we backport?
Comment From: oranagra
@guybe7 what does #11615 has to do with this? It's a fix for a pr that wasn't cherrypicked either.
Comment From: guybe7
11615 isn't a fix to the exec-unit PR (perhaps I should have been more clear in the commit comment)
the effect is that after expireSlaveKeys the also_propagate array is empty
for 7.0 the fix would be to add propagatePendingCommands at the end of expireSlaveKeys
Comment From: oranagra
ohh, ok, i'll note it there and handle it in the next release