Although this is related to a module, we've created a super simple reproducible module to ensure the issue was in fact not specific to our module.
Crash report
1:S 02 Feb 2024 22:11:28.598 * Successful partial resynchronization with master.
1:S 02 Feb 2024 22:11:28.598 * Master replication ID changed to 2fe1b84f88ef01b46170de4eb12023a46a3c718c
1:S 02 Feb 2024 22:11:28.598 * <example> [EventHandler] Event Handler EVENT 7 SUBEVENT 0
1:S 02 Feb 2024 22:11:28.598 * <example> [EventHandler] Master Replication Link Up
1:S 02 Feb 2024 22:11:28.598 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
1:S 02 Feb 2024 22:11:28.598 * <example> [EventHandler] Event Handler EVENT 7 SUBEVENT 1
1:S 02 Feb 2024 22:11:28.598 * <example> [EventHandler] Master Replication Link Down
1:M 02 Feb 2024 22:11:28.598 * Connection with master lost.
1:M 02 Feb 2024 22:11:28.598 * Caching the disconnected master state.
1:M 02 Feb 2024 22:11:28.599 * <example> [EventHandler] Event Handler EVENT 7 SUBEVENT 1
1:M 02 Feb 2024 22:11:28.599 * <example> [EventHandler] Master Replication Link Down
1:M 02 Feb 2024 22:11:28.599 * Discarding previously cached master state.
1:M 02 Feb 2024 22:11:28.599 * Setting secondary replication ID to 2fe1b84f88ef01b46170de4eb12023a46a3c718c, valid up to offset: 49540. New replication ID is 802c95cd352f2914af3d36592c2e761ca5d323de
1:M 02 Feb 2024 22:11:28.599 * <example> [EventHandler] Event Handler EVENT 0 SUBEVENT 0
1:M 02 Feb 2024 22:11:28.599 * <example> [EventHandler] Became Master
1:M 02 Feb 2024 22:11:28.599 * <example> [KAU] AAAAA master set
=== REDIS BUG REPORT START: Cut & paste starting from here ===
1:M 02 Feb 2024 22:11:28.599 # Redis 7.2.4 crashed by signal: 11, si_code: 1
1:M 02 Feb 2024 22:11:28.599 # Accessing address: 0xffffffffffffffff
1:M 02 Feb 2024 22:11:28.599 # Crashed running the instruction at: 0x55bf9930d41b
------ STACK TRACE ------
EIP:
redis-server *:6379(getClientMemoryUsage+0x4b)[0x55bf9930d41b]
Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7fa834e60140]
redis-server *:6379(getClientMemoryUsage+0x4b)[0x55bf9930d41b]
redis-server *:6379(catClientInfoString+0x16e)[0x55bf9931168e]
redis-server *:6379(replicaofCommand+0x17c)[0x55bf9932b06c]
redis-server *:6379(call+0x170)[0x55bf992ed740]
redis-server *:6379(processCommand+0xb69)[0x55bf992ee9e9]
redis-server *:6379(processInputBuffer+0xf7)[0x55bf99312b37]
redis-server *:6379(readQueryFromClient+0x350)[0x55bf993130a0]
redis-server *:6379(+0x1ae77c)[0x55bf9940477c]
redis-server *:6379(+0x1b4562)[0x55bf9940a562]
redis-server *:6379(aeMain+0xf9)[0x55bf992e3d09]
redis-server *:6379(main+0x3cd)[0x55bf992d8ecd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7fa834c9cd0a]
redis-server *:6379(_start+0x2a)[0x55bf992d964a]
------ REGISTERS ------
1:M 02 Feb 2024 22:11:28.600 #
RAX:0000000000000000 RBX:00007fa83492c880
RCX:00000000000000ff RDX:0000000000000000
RDI:0000000000000000 RSI:00007ffdf2a8de58
RBP:00007fa83492c880 RSP:00007ffdf2a8dd70
R8 :00007fa834858253 R9 :0000000000000000
R10:0000000000e17908 R11:00007fa834c751c0
R12:0000000000000000 R13:00007ffdf2a8de60
R14:000055bf994103e0 R15:00007ffdf2a8df50
RIP:000055bf9930d41b EFL:0000000000010202
CSGSFS:002b000000000033
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd7f) -> 000000407794b4af
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd7e) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd7d) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd7c) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd7b) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd7a) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd79) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd78) -> 0000000000000040
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd77) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd76) -> 0000000000000000
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd75) -> 000055bf9931168e
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd74) -> 000055bf994103e0
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd73) -> 00007ffdf2a8de60
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd72) -> 00007fa834863828
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd71) -> 00007fa83492c880
1:M 02 Feb 2024 22:11:28.600 # (00007ffdf2a8dd70) -> 00007fa83492c880
------ INFO OUTPUT ------
# Server
redis_version:7.2.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:5480667114e70c31
redis_mode:standalone
os:Linux 4.18.0-477.15.1.el8_8.x86_64 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:10.2.1
process_id:1
process_supervised:no
run_id:f4ef00a857c392c56b11d06e766baa679420c353
tcp_port:6379
server_time_usec:1706911888598894
uptime_in_seconds:154
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12413072
executable:/redis-server
config_file:
io_threads_active:0
listener2:name=tls,bind=*,bind=-::*,port=6379
# Clients
connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0
# Memory
used_memory:1337704
used_memory_human:1.28M
used_memory_rss:15814656
used_memory_rss_human:15.08M
used_memory_peak:1401080
used_memory_peak_human:1.34M
used_memory_peak_perc:95.48%
used_memory_overhead:971108
used_memory_startup:883824
used_memory_dataset:366596
used_memory_dataset_perc:80.77%
allocator_allocated:1460864
allocator_active:1789952
allocator_resident:4861952
total_system_memory:8070426624
total_system_memory_human:7.52G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.23
allocator_frag_bytes:329088
allocator_rss_ratio:2.72
allocator_rss_bytes:3072000
rss_overhead_ratio:3.25
rss_overhead_bytes:10952704
mem_fragmentation_ratio:12.35
mem_fragmentation_bytes:14533648
mem_not_counted_for_evict:0
mem_replication_backlog:61516
mem_total_replication_buffers:61512
mem_clients_slaves:0
mem_clients_normal:25472
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:2
rdb_bgsave_in_progress:0
rdb_last_save_time:1706911734
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:70
total_commands_processed:1020
instantaneous_ops_per_sec:7
total_net_input_bytes:108741
total_net_output_bytes:585012
total_net_repl_input_bytes:49057
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.40
instantaneous_output_kbps:11.02
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:1
dump_payload_sanitizations:0
total_reads_processed:1047
total_writes_processed:2016
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:48
reply_buffer_expands:48
eventloop_cycles:2767
eventloop_duration_sum:420291
eventloop_duration_cmd_sum:7374
instantaneous_eventloop_cycles_per_sec:20
instantaneous_eventloop_duration_usec:170
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:802c95cd352f2914af3d36592c2e761ca5d323de
master_replid2:2fe1b84f88ef01b46170de4eb12023a46a3c718c
master_repl_offset:49539
second_repl_offset:49540
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1268
repl_backlog_histlen:48272
# CPU
used_cpu_sys:0.263239
used_cpu_user:0.100963
used_cpu_sys_children:0.024350
used_cpu_user_children:0.012801
used_cpu_sys_main_thread:0.261811
used_cpu_user_main_thread:0.100837
# Modules
module:name=example,ver=1,api=1,filters=0,usedby=[],using=[],options=[]
# Commandstats
cmdstat_publish:calls=398,usec=2190,usec_per_call=5.50,rejected_calls=0,failed_calls=0
cmdstat_select:calls=3,usec=3,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_exec:calls=1,usec=1152,usec_per_call=1152.00,rejected_calls=0,failed_calls=0
cmdstat_set:calls=2,usec=9,usec_per_call=4.50,rejected_calls=0,failed_calls=0
cmdstat_config|rewrite:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=1
cmdstat_subscribe:calls=3,usec=9,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_client|setname:calls=6,usec=7,usec_per_call=1.17,rejected_calls=0,failed_calls=0
cmdstat_client|kill:calls=2,usec=294,usec_per_call=147.00,rejected_calls=0,failed_calls=0
cmdstat_slaveof:calls=1,usec=831,usec_per_call=831.00,rejected_calls=0,failed_calls=0
cmdstat_multi:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=470,usec=563,usec_per_call=1.20,rejected_calls=0,failed_calls=0
cmdstat_auth:calls=70,usec=308,usec_per_call=4.40,rejected_calls=0,failed_calls=0
cmdstat_info:calls=61,usec=3128,usec_per_call=51.28,rejected_calls=0,failed_calls=0
cmdstat_replconf:calls=1,usec=4,usec_per_call=4.00,rejected_calls=0,failed_calls=0
# Errorstats
errorstat_ERR:count=1
# Latencystats
latency_percentiles_usec_publish:p50=5.023,p99=23.039,p99.9=89.087
latency_percentiles_usec_select:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_exec:p50=1155.071,p99=1155.071,p99.9=1155.071
latency_percentiles_usec_set:p50=4.015,p99=5.023,p99.9=5.023
latency_percentiles_usec_config|rewrite:p50=3.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_subscribe:p50=3.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_client|setname:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_client|kill:p50=106.495,p99=188.415,p99.9=188.415
latency_percentiles_usec_slaveof:p50=831.487,p99=831.487,p99.9=831.487
latency_percentiles_usec_multi:p50=10.047,p99=10.047,p99.9=10.047
latency_percentiles_usec_ping:p50=1.003,p99=2.007,p99.9=25.087
latency_percentiles_usec_auth:p50=4.015,p99=11.007,p99.9=22.015
latency_percentiles_usec_info:p50=51.199,p99=104.447,p99.9=117.247
latency_percentiles_usec_replconf:p50=4.015,p99=4.015,p99.9=4.015
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=2,expires=0,avg_ttl=0
------ CLIENT LIST OUTPUT ------
id=20 addr=10.42.2.161:38250 laddr=10.42.0.127:6379 fd=14 name=sentinel-065e2884-cmd age=111 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=4096 rbp=2048 obl=0 oll=0 omem=0 tot-mem=25472 events=r cmd=publish user=default redir=-1 resp=2 lib-name= lib-ver=
------ EXECUTING CLIENT INFO ------
- Debian (docker.io/bitnami/redis:7.2.4-debian-11-r2)
- Deploy a 3 node redis (with 3 sentinel sidecars)
- Once all are loaded kill the master
The module was taken from here. Compiled with
gcc -I/usr/include -Wall -g -fPIC -lc -lm -std=gnu99 -c -o module.o module.c
ld -o module.so module.o -shared -Bsymbolic -lc
The module.c has been very much simplified by essentially doing nothing. We tested the failover with just log messages and it worked perfectly. Once we added the SET calls it crashes every time.
#include <redismodule.h>
#include <inttypes.h>
int ParseCommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc) {
return REDISMODULE_OK;
}
void EventHandler(RedisModuleCtx *ctx, RedisModuleEvent eid, uint64_t subevent, void *data)
{
RedisModule_Log(ctx, REDISMODULE_LOGLEVEL_NOTICE, "[EventHandler] Event Handler EVENT %"PRIu64" SUBEVENT %"PRIu64, eid.id, subevent);
switch (eid.id)
{
case REDISMODULE_EVENT_MASTER_LINK_CHANGE:
if (subevent == REDISMODULE_SUBEVENT_MASTER_LINK_DOWN){
RedisModule_Log(ctx, REDISMODULE_LOGLEVEL_NOTICE, "[EventHandler] Master Replication Link Down");
}
if (subevent == REDISMODULE_SUBEVENT_MASTER_LINK_UP){
RedisModule_Log(ctx, REDISMODULE_LOGLEVEL_NOTICE, "[EventHandler] Master Replication Link Up");
}
break;
case REDISMODULE_EVENT_REPLICATION_ROLE_CHANGED:
switch (subevent)
{
case REDISMODULE_EVENT_REPLROLECHANGED_NOW_MASTER:
{
RedisModule_Log(ctx, REDISMODULE_LOGLEVEL_NOTICE, "[EventHandler] Became Master");
RedisModule_Log(ctx, REDISMODULE_LOGLEVEL_NOTICE, "[KAU] AAAAA master set");
RedisModule_Call(ctx, "SET", "cc", "kau-master", "1");
break;
}
case REDISMODULE_EVENT_REPLROLECHANGED_NOW_REPLICA:
RedisModule_Log(ctx, REDISMODULE_LOGLEVEL_NOTICE, "[EventHandler] Became Replica");
RedisModule_Log(ctx, REDISMODULE_LOGLEVEL_NOTICE, "[KAU] AAAAA replica set");
RedisModule_Call(ctx, "SET", "cc", "kau-replica", "1");
break;
default:
break;
}
break;
default:
break;
}
}
int RedisModule_OnLoad(RedisModuleCtx *ctx) {
// Register the module itself
if (RedisModule_Init(ctx, "example", 1, REDISMODULE_APIVER_1) == REDISMODULE_ERR) {
return REDISMODULE_ERR;
}
// register example.parse - the default registration syntax
if (RedisModule_CreateCommand(ctx, "example.parse", ParseCommand, "readonly", 1, 1, 1) == REDISMODULE_ERR) {
return REDISMODULE_ERR;
}
RedisModule_SubscribeToServerEvent(ctx, RedisModuleEvent_ReplicationRoleChanged, EventHandler);
/* Example on how to check if a server sub event is supported */
if (RedisModule_IsSubEventSupported(RedisModuleEvent_MasterLinkChange, REDISMODULE_SUBEVENT_MASTER_LINK_UP)) {
RedisModule_SubscribeToServerEvent(ctx, RedisModuleEvent_MasterLinkChange, EventHandler);
}
return REDISMODULE_OK;
}
Comment From: enjoy-binbin
Can you provide more specific steps? I tried it but couldn't reproduce it. I'm worried that my steps are wrong:
69592:S 04 Feb 2024 16:06:07.092 * Master replied to PING, replication can continue...
69592:S 04 Feb 2024 16:06:07.092 * Trying a partial resynchronization (request 0a3f50861333bbbbe59f4c90679f00a2b45fb939:1).
69592:S 04 Feb 2024 16:06:07.093 * Full resync from master: 94e431343b5df4d1c60ac31bd223dfd4cac5eb15:0
69592:S 04 Feb 2024 16:06:07.094 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
69592:S 04 Feb 2024 16:06:07.094 * Discarding previously cached master state.
69592:S 04 Feb 2024 16:06:07.094 * MASTER <-> REPLICA sync: Flushing old data
69592:S 04 Feb 2024 16:06:07.094 * MASTER <-> REPLICA sync: Loading DB in memory
69592:S 04 Feb 2024 16:06:07.116 * Loading RDB produced by version 7.2.4
69592:S 04 Feb 2024 16:06:07.116 * RDB age 0 seconds
69592:S 04 Feb 2024 16:06:07.116 * RDB memory usage when created 1.21 Mb
69592:S 04 Feb 2024 16:06:07.116 * Done loading RDB, keys loaded: 0, keys expired: 0.
69592:S 04 Feb 2024 16:06:07.117 * <example> [EventHandler] Event Handler EVENT 7 SUBEVENT 0
69592:S 04 Feb 2024 16:06:07.117 * <example> [EventHandler] Master Replication Link Up
69592:S 04 Feb 2024 16:06:07.117 * MASTER <-> REPLICA sync: Finished with success
69592:S 04 Feb 2024 16:06:09.103 - Client closed connection id=8 addr=127.0.0.1:21114 laddr=127.0.0.1:52974 fd=13 name= age=2 idle=2 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=45050 argv-mem=0 multi-mem=0 rbs=1024 rbp=34 obl=0 oll=0 omem=0 tot-mem=46864 events=r cmd=NULL user=(superuser) redir=-1 resp=2 lib-name= lib-ver=
69592:S 04 Feb 2024 16:06:09.103 * Connection with master lost.
69592:S 04 Feb 2024 16:06:09.103 * Caching the disconnected master state.
69592:S 04 Feb 2024 16:06:09.103 * <example> [EventHandler] Event Handler EVENT 7 SUBEVENT 1
69592:S 04 Feb 2024 16:06:09.103 * <example> [EventHandler] Master Replication Link Down
69592:S 04 Feb 2024 16:06:09.103 * Reconnecting to MASTER 127.0.0.1:21114
69592:S 04 Feb 2024 16:06:09.103 * MASTER <-> REPLICA sync started
69592:S 04 Feb 2024 16:06:09.103 # Error condition on socket for SYNC: Connection refused
69592:S 04 Feb 2024 16:06:09.533 * Connecting to MASTER 127.0.0.1:21114
69592:S 04 Feb 2024 16:06:09.533 * MASTER <-> REPLICA sync started
69592:S 04 Feb 2024 16:06:09.533 # Error condition on socket for SYNC: Connection refused
69592:M 04 Feb 2024 16:06:10.106 * Discarding previously cached master state.
69592:M 04 Feb 2024 16:06:10.106 * Setting secondary replication ID to 94e431343b5df4d1c60ac31bd223dfd4cac5eb15, valid up to offset: 1. New replication ID is ce61250a200d94bf02b34822981b1564329d889d
69592:M 04 Feb 2024 16:06:10.106 * <example> [EventHandler] Event Handler EVENT 0 SUBEVENT 0
69592:M 04 Feb 2024 16:06:10.106 * <example> [EventHandler] Became Master
69592:M 04 Feb 2024 16:06:10.106 * <example> [KAU] AAAAA master set
69592:M 04 Feb 2024 16:06:10.107 * MASTER MODE enabled (user request from 'id=4 addr=127.0.0.1:52965 laddr=127.0.0.1:21111 fd=12 name= age=4 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=34 qbuf-free=16856 argv-mem=12 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=18740 events=r cmd=slaveof user=default redir=-1 resp=2 lib-name= lib-ver=')
69592:M 04 Feb 2024 16:06:11.543 - DB 0: 1 keys (0 volatile) in 4 slots HT.
69592:M 04 Feb 2024 16:06:12.235 - Client closed connection id=4 addr=127.0.0.1:52965 laddr=127.0.0.1:21111 fd=12 name= age=6 idle=2 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=18704 events=r cmd=slaveof user=default redir=-1 resp=2 lib-name= lib-ver=
69592:signal-handler (1707033972) Received SIGTERM scheduling shutdown...
69592:M 04 Feb 2024 16:06:12.249 * User requested shutdown...
Comment From: wgnathanael
Hmm... I'm not sure. We're using the bitnami provided containers... Perhaps the issue lies there.
How were you testing your setup so we can give that a try as well?
In our setup we had 3 nodes running in k8s. Each redis instance had a sentinel sidecar. The setup for this was via a helm chart. I'll see if I can manually setup some redis containers to see if it still happens in a different environment. I'll also look at posting a step by step build+deploy scenario that can reproduce maybe just via podman/docker so its easier to test.
Comment From: wgnathanael
Ok so upon further testing, this seems to only crash when using the bitnami container. I'll close this ticket and see if we can narrow down the issue. Thanks for checking and spending time on this.
Comment From: wgnathanael
So although this doesn't happen in the regular redis container if you have any insights into what could be going wrong I would appreciate it. I've ran redis under gdb in their container and it comes down to the function call
size_t getClientMemoryUsage(client *c, size_t *output_buffer_mem_usage) {
size_t mem = getClientOutputBufferMemoryUsage(c);
if (output_buffer_mem_usage != NULL)
*output_buffer_mem_usage = mem;
mem += sdsZmallocSize(c->querybuf); <--- crash point
mem += zmalloc_size(c);
...
Any thoughts on what would have to happen for that to crash? Its not saying sdsZmallocSize was called so c->querybuff is null or c is null?? I doesn't seem like c is null based on the bt:
#0 0x00005583d656a41b in getClientMemoryUsage (c=0x7fc797f0ec00, output_buffer_mem_usage=0x7ffcb46d1eb8) at networking.c:3787
Anyway, I imaging you're all busy but any pointers (no pun intended) would be helpful