- Issue first introduced with https://github.com/redis/redis/pull/10126
- Removing CMD_STALE from EVAL doesn't allow crash to happen
Crash report
70824:S 02 Apr 2022 00:24:52.961 * MASTER <-> REPLICA sync started
70824:S 02 Apr 2022 00:24:52.961 # Error condition on socket for SYNC: Connection refused
70824:S 02 Apr 2022 00:24:53.965 * Connecting to MASTER localhost:6379
70824:S 02 Apr 2022 00:24:53.966 * MASTER <-> REPLICA sync started
70824:S 02 Apr 2022 00:24:53.966 # Error condition on socket for SYNC: Connection refused
70824:S 02 Apr 2022 00:24:54.969 * Connecting to MASTER localhost:6379
70824:S 02 Apr 2022 00:24:54.969 * MASTER <-> REPLICA sync started
70824:S 02 Apr 2022 00:24:54.969 * Non blocking connect for SYNC fired the event.
70824:S 02 Apr 2022 00:24:54.998 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory'
70824:S 02 Apr 2022 00:24:55.972 * Connecting to MASTER localhost:6379
70824:S 02 Apr 2022 00:24:55.972 * MASTER <-> REPLICA sync started
70824:S 02 Apr 2022 00:24:55.972 * Non blocking connect for SYNC fired the event.
70824:S 02 Apr 2022 00:24:56.003 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory'
70824:S 02 Apr 2022 00:24:56.975 * Connecting to MASTER localhost:6379
70824:S 02 Apr 2022 00:24:56.975 * MASTER <-> REPLICA sync started
70824:S 02 Apr 2022 00:24:56.975 * Non blocking connect for SYNC fired the event.
70824:S 02 Apr 2022 00:24:56.998 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory'
70824:S 02 Apr 2022 00:24:57.978 * Connecting to MASTER localhost:6379
70824:S 02 Apr 2022 00:24:57.978 * MASTER <-> REPLICA sync started
70824:S 02 Apr 2022 00:24:57.978 * Non blocking connect for SYNC fired the event.
70824:S 02 Apr 2022 00:24:58.005 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory'
70824:S 02 Apr 2022 00:24:58.981 * Connecting to MASTER localhost:6379
70824:S 02 Apr 2022 00:24:58.981 * MASTER <-> REPLICA sync started
70824:S 02 Apr 2022 00:24:58.981 * Non blocking connect for SYNC fired the event.
70824:S 02 Apr 2022 00:24:58.983 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory'
70824:S 02 Apr 2022 00:24:59.985 * Connecting to MASTER localhost:6379
=== REDIS BUG REPORT START: Cut & paste starting from here ===
70824:S 02 Apr 2022 00:24:59.985 # Redis 255.255.255 crashed by signal: 11, si_code: 128
70824:S 02 Apr 2022 00:24:59.985 # Accessing address: (nil)
70824:S 02 Apr 2022 00:24:59.985 # Crashed running the instruction at: 0x7f5988415c84
------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libc.so.6(+0x109c84)[0x7f5988415c84]
Backtrace:
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f598834e520]
/lib/x86_64-linux-gnu/libc.so.6(+0x109c84)[0x7f5988415c84]
/lib/x86_64-linux-gnu/libc.so.6(getaddrinfo+0x129)[0x7f5988417779]
./redis-server 127.0.0.1:6378(+0x700ee)[0x5626c2aa10ee]
./redis-server 127.0.0.1:6378(+0x147087)[0x5626c2b78087]
./redis-server 127.0.0.1:6378(connectWithMaster+0x42)[0x5626c2ad6da2]
./redis-server 127.0.0.1:6378(replicationCron+0x367)[0x5626c2adb487]
./redis-server 127.0.0.1:6378(serverCron+0x2d8)[0x5626c2aaae98]
./redis-server 127.0.0.1:6378(aeProcessEvents+0x2f1)[0x5626c2aa0801]
./redis-server 127.0.0.1:6378(aeMain+0x1d)[0x5626c2aa0abd]
./redis-server 127.0.0.1:6378(main+0x308)[0x5626c2a9c268]
/lib/x86_64-linux-gnu/libc.so.6(+0x29fd0)[0x7f5988335fd0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x7d)[0x7f598833607d]
./redis-server 127.0.0.1:6378(_start+0x25)[0x5626c2a9c765]
------ REGISTERS ------
70824:S 02 Apr 2022 00:24:59.988 #
RAX:f9dba8288a9ee8dc RBX:00007ffee90290a0
RCX:0000000000000400 RDX:00007ffee90290b0
RDI:00007f5987e9a0d1 RSI:00007ffee9028da0
RBP:00007ffee9028f20 RSP:00007ffee9028ca0
R8 :00007f59883086a8 R9 :00007f598830870c
R10:0000000000000000 R11:0000000000000000
R12:ffffffffffffe74c R13:0000000000000000
R14:00007ffee9029530 R15:0000000000000000
RIP:00007f5988415c84 EFL:0000000000010212
CSGSFS:002b000000000033
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028caf) -> 00007f5900000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028cae) -> 00007ffee9028fa8
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028cad) -> 0000000000000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028cac) -> 0000000000000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028cab) -> 0000000000000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028caa) -> 00007ffe0000eb18
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca9) -> 0000000600000001
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca8) -> 0000000000000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca7) -> 0000000000000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca6) -> 0000000000000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca5) -> 0000000000000008
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca4) -> 00007ffe00000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca3) -> 00007f59885226c0
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca2) -> 0000000000000000
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca1) -> 00007f5988415c49
70824:S 02 Apr 2022 00:24:59.988 # (00007ffee9028ca0) -> 0000000000000000
------ INFO OUTPUT ------
# Server
redis_version:255.255.255
redis_git_sha1:b8eb2a73
redis_git_dirty:1
redis_build_id:14c87aa0476c61e5
redis_mode:standalone
os:Linux 5.13.0-35-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:11.2.0
process_id:70824
process_supervised:no
run_id:b7e83585082a581b39ac8c945baed84e7562c21d
tcp_port:6378
server_time_usec:1648851899985449
uptime_in_seconds:132
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:4684731
executable:/home/eduardo/repos/redis/src/./redis-server
config_file:/home/eduardo/repos/redis/redis.conf
io_threads_active:0
# Clients
connected_clients:0
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
# Memory
used_memory:927190800
used_memory_human:884.24M
used_memory_rss:945872896
used_memory_rss_human:902.05M
used_memory_peak:927227152
used_memory_peak_human:884.27M
used_memory_peak_perc:100.00%
used_memory_overhead:535098844
used_memory_startup:858392
used_memory_dataset:392091956
used_memory_dataset_perc:42.33%
allocator_allocated:927299800
allocator_active:927592448
allocator_resident:943992832
total_system_memory:16619466752
total_system_memory_human:15.48G
used_memory_lua:32768
used_memory_vm_eval:32768
used_memory_lua_human:32.00K
used_memory_scripts_eval:144
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:37888
used_memory_vm_total:70656
used_memory_vm_total_human:69.00K
used_memory_functions:184
used_memory_scripts:328
used_memory_scripts_human:328B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:292648
allocator_rss_ratio:1.02
allocator_rss_bytes:16400384
rss_overhead_ratio:1.00
rss_overhead_bytes:1880064
mem_fragmentation_ratio:1.02
mem_fragmentation_bytes:18684064
mem_not_counted_for_evict:0
mem_replication_backlog:20508
mem_total_replication_buffers:20504
mem_clients_slaves:0
mem_clients_normal:1816
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1648851767
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:10000001
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:121
total_commands_processed:114
instantaneous_ops_per_sec:1
total_net_input_bytes:535562547
total_net_output_bytes:7751
instantaneous_input_kbps:0.06
instantaneous_output_kbps:0.10
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:71
dump_payload_sanitizations:0
total_reads_processed:251
total_writes_processed:172
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:2
reply_buffer_expands:0
# Replication
role:slave
master_host:localhost
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_read_repl_offset:70
slave_repl_offset:70
master_link_down_since_seconds:19
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:d27a75a2f2955ee5070bfa45f1a1a659562d71ab
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:70
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:70
# CPU
used_cpu_sys:2.102113
used_cpu_user:26.324652
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:2.029005
used_cpu_user_main_thread:26.320263
# Modules
# Commandstats
cmdstat_eval:calls=106,usec=2188,usec_per_call=20.64,rejected_calls=15,failed_calls=56
cmdstat_ping:calls=8,usec=3,usec_per_call=0.38,rejected_calls=0,failed_calls=0
# Errorstats
errorstat_LOADING:count=15
errorstat_MASTERDOWN:count=56
# Latencystats
latency_percentiles_usec_eval:p50=18.047,p99=59.135,p99.9=108.031
latency_percentiles_usec_ping:p50=0.001,p99=1.003,p99.9=1.003
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=10000001,expires=0,avg_ttl=0
------ CLIENT LIST OUTPUT ------
------ MODULES INFO OUTPUT ------
------ CONFIG DEBUG OUTPUT ------
list-compress-depth 0
repl-diskless-load disabled
proto-max-bulk-len 512mb
lazyfree-lazy-server-del no
client-query-buffer-limit 1gb
activedefrag no
sanitize-dump-payload no
lazyfree-lazy-user-flush no
lazyfree-lazy-expire no
repl-diskless-sync yes
lazyfree-lazy-user-del no
slave-read-only yes
io-threads-do-reads no
io-threads 1
replica-read-only yes
lazyfree-lazy-eviction no
------ FAST MEMORY TEST ------
70824:S 02 Apr 2022 00:24:59.990 # Bio thread for job type #0 terminated
70824:S 02 Apr 2022 00:24:59.990 # Bio thread for job type #1 terminated
70824:S 02 Apr 2022 00:24:59.991 # Bio thread for job type #2 terminated
*** Preparing to test memory region 5626c2d06000 (2297856 bytes)
*** Preparing to test memory region 5626c3718000 (135168 bytes)
*** Preparing to test memory region 7f5940000000 (135168 bytes)
*** Preparing to test memory region 7f5946600000 (8388608 bytes)
*** Preparing to test memory region 7f5946e00000 (262144000 bytes)
*** Preparing to test memory region 7f5956900000 (783286272 bytes)
*** Preparing to test memory region 7f598548c000 (8388608 bytes)
*** Preparing to test memory region 7f5985c8d000 (8388608 bytes)
*** Preparing to test memory region 7f598648e000 (8388608 bytes)
*** Preparing to test memory region 7f5986c8f000 (8388608 bytes)
*** Preparing to test memory region 7f5987a00000 (8388608 bytes)
*** Preparing to test memory region 7f5988308000 (16384 bytes)
*** Preparing to test memory region 7f5988527000 (53248 bytes)
*** Preparing to test memory region 7f5988618000 (8192 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /lib/x86_64-linux-gnu/libc.so.6 (base 0x7f598830c000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------
=== REDIS BUG REPORT END.
Sometimes crash is different:
25343:S 02 Apr 2022 13:01:18.234 * Non blocking connect for SYNC fired the event.
25343:S 02 Apr 2022 13:01:18.270 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory'
25343:S 02 Apr 2022 13:01:19.238 * Connecting to MASTER localhost:6379
25343:S 02 Apr 2022 13:01:19.238 * MASTER <-> REPLICA sync started
25343:S 02 Apr 2022 13:01:19.238 * Non blocking connect for SYNC fired the event.
25343:S 02 Apr 2022 13:01:19.277 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory'
25343:S 02 Apr 2022 13:01:20.242 * Connecting to MASTER localhost:6379
25343:S 02 Apr 2022 13:01:20.242 * MASTER <-> REPLICA sync started
25343:S 02 Apr 2022 13:01:20.242 * Non blocking connect for SYNC fired the event.
25343:S 02 Apr 2022 13:01:20.242 * Master replied to PING, replication can continue...
25343:S 02 Apr 2022 13:01:20.243 * Trying a partial resynchronization (request 1dafe6a101d709ff596de1f1ac4b8548acc0a337:141).
25343:S 02 Apr 2022 13:01:25.270 * Full resync from master: a577f6677ae5f45c8c97fdcd00a8787148c8648d:14
25343:S 02 Apr 2022 13:01:25.291 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
25343:S 02 Apr 2022 13:01:29.932 * Discarding previously cached master state.
25343:S 02 Apr 2022 13:01:29.932 * MASTER <-> REPLICA sync: Flushing old data
25343:S 02 Apr 2022 13:01:36.091 * MASTER <-> REPLICA sync: Loading DB in memory
25343:S 02 Apr 2022 13:01:36.104 * Loading RDB produced by version 255.255.255
25343:S 02 Apr 2022 13:01:36.104 * RDB age 11 seconds
25343:S 02 Apr 2022 13:01:36.104 * RDB memory usage when created 884.22 Mb
25343:S 02 Apr 2022 13:01:40.597 * Done loading RDB, keys loaded: 10000000, keys expired: 0.
25343:S 02 Apr 2022 13:01:40.597 * MASTER <-> REPLICA sync: Finished with success
realloc(): invalid next size
=== REDIS BUG REPORT START: Cut & paste starting from here ===
25343:S 02 Apr 2022 13:01:41.304 # Redis 255.255.255 crashed by signal: 6, si_code: 0
25343:S 02 Apr 2022 13:01:41.304 # Killed by PID: 25343, UID: 1000
25343:S 02 Apr 2022 13:01:41.304 # Crashed running the instruction at: 0x7fafb3821828
(nothing else)
Additional information
Ubuntu 21.10 x86_64 Kernel: 5.13.0-35-generic
- Steps to reproduce Preparation:
Using latest from unstable (last commit https://github.com/redis/redis/commit/b8eb2a73408fa3b8845760857dd6fcccb62107fe)
Run:
Master:
./redis-server ../redis.conf --dir master --appendonly yes
Replica:
./redis-server ../redis.conf --port 6378 --replicaof localhost 6379 --dir replica0
redis.conf is standard with the following changes:
replica-serve-stale-data no
enable-debug-command yes
save ""
Populate master
./redis-cli -p 6379 DEBUG POPULATE 10000000
./redis-cli -p 6379 BGREWRITEAOF
Keep running EVAL calls a few times per second (the more frequent, higher the chances of crashing):
for i in {1..1000}; do ./redis-cli -p 6378 EVAL "return ARGV[1]" 0 hello; date; sleep 0.2; done
Causing crash: By doing this I get crashes on replica about 1/2 of times if running EVAL 1 per second. And virtually every time if calling 5 times per second: 1. Shutdown master and wait a few seconds 2. Start master again with same command mentioned above 3. Wait for full replication 4. All of this while EVAL commands are being issued
Comment From: oranagra
Thanks for reporting.. so i assume you got to #10126 by trial and error, right?
Comment From: eduardobr
@oranagra Yes, bisecting commits