Redis [BUG] redis sentinel 100% cpu usage

Describe the bug redis sentinel is having 100% cpu usage. we are using redis-ha helm chart version 4.12.15 under default configuration but we have increased resource requests and limits by a significant amount. The redis version used by us is 6.0.7.

To reproduce

I'm not sure if it can be reproduced by anyone else. Even after redis restart, one or two sentinel reaches 100% CPU utilization. Other redis metrics are fine and seems okay.

Expected behavior Sentinel should be taking so much CPU.

Additional information

I ran strace on one of the sentinel and found this:

1 09:20:08 recvfrom(11, 0x7ffd55fab4a0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
1 09:20:08 recvfrom(10, 0x7ffd55fab4a0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
1 09:20:08 epoll_pwait(5, [{events=EPOLLERR, data={u32=11, u64=11}}, {events=EPOLLERR, data={u32=10, u64=10}}], 10128, 8, NULL, 😎 = 2

This seems like a repeating pattern; where redis is trying to read from two sockets and failing continuously and this followed by a epoll_wait. It happens more than 100x times in a second.

socket 11 and 10 seems to point to a connection with the redis process for this sentinel. To be exact it points to the ClusterIP of the redis-cluster-announce pod on which sentinel was running and the port is 6379.

redis

# Server
redis_version:6.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4b6e03cc4dd13d7e
redis_mode:standalone
os:Linux 5.10.68-62.173.amzn2.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.3.0
process_id:1
run_id:9e5a59931ada7e4b1198b6a5beb98defb5834cc0
tcp_port:6379
uptime_in_seconds:66843
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12347300
executable:/data/redis-server
config_file:/data/conf/redis.conf
io_threads_active:0

# Clients
connected_clients:8
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:2727256
used_memory_human:2.60M
used_memory_rss:8065024
used_memory_rss_human:7.69M
used_memory_peak:2981024
used_memory_peak_human:2.84M
used_memory_peak_perc:91.49%
used_memory_overhead:2137792
used_memory_startup:803016
used_memory_dataset:589464
used_memory_dataset_perc:30.63%
allocator_allocated:2827152
allocator_active:3174400
allocator_resident:6668288
total_system_memory:66245685248
total_system_memory_human:61.70G
used_memory_lua:48128
used_memory_lua_human:47.00K
used_memory_scripts:3072
used_memory_scripts_human:3.00K
number_of_cached_scripts:4
maxmemory:0
maxmemory_human:0B
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.12
allocator_frag_bytes:347248
allocator_rss_ratio:2.10
allocator_rss_bytes:3493888
rss_overhead_ratio:1.21
rss_overhead_bytes:1396736
mem_fragmentation_ratio:2.96
mem_fragmentation_bytes:5339328
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:0
mem_clients_normal:135888
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:23
rdb_bgsave_in_progress:0
rdb_last_save_time:1639737083
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:524288
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:20058
total_commands_processed:491483
instantaneous_ops_per_sec:5
total_net_input_bytes:33351575
total_net_output_bytes:275568645
instantaneous_input_kbps:0.54
instantaneous_output_kbps:1.59
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:331
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_reads_processed:501438
total_writes_processed:1017489
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Replication
role:slave
master_host:172.20.61.168
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:226855520
slave_priority:100
slave_read_only:1
connected_slaves:0
min_slaves_good_slaves:0
master_replid:6d50a55ecc1d386809fc186708bcc56db6f92765
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:226855520
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:225806945
repl_backlog_histlen:1048576

# CPU
used_cpu_sys:81.727130
used_cpu_user:54.375754
used_cpu_sys_children:0.204695
used_cpu_user_children:0.501349

# Modules

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=2861,expires=0,avg_ttl=0

sentinel

127.0.0.1:26379> info
# Server
redis_version:6.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4b6e03cc4dd13d7e
redis_mode:sentinel
os:Linux 5.10.68-62.173.amzn2.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.3.0
process_id:1
run_id:0919ce29eb1b7f9dce36cea0a2eb73bf1740939c
tcp_port:26379
uptime_in_seconds:66926
uptime_in_days:0
hz:11
configured_hz:10
lru_clock:12347383
executable:/data/redis-sentinel
config_file:/data/conf/sentinel.conf
io_threads_active:0

# Clients
connected_clients:4
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# CPU
used_cpu_sys:30052.725830
used_cpu_user:9940.032902
used_cpu_sys_children:0.000000
used_cpu_user_children:0.005161

# Stats
total_connections_received:8925
total_commands_processed:200173
instantaneous_ops_per_sec:5
total_net_input_bytes:11111722
total_net_output_bytes:1205818
instantaneous_input_kbps:0.35
instantaneous_output_kbps:0.03
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:772
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_reads_processed:204104
total_writes_processed:195182
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.20.61.168:6379,slaves=2,sentinels=3

Any tops for debugging this?

Comment From: betermieux

Same problem here. After a few weeks of usage the slave sentinels spike 100% cpu, the master sentinel is idle. I am using the redis-ha chart from https://dandydeveloper.github.io/charts Let me know if I can provide more information to this issue.

Comment From: oranagra

can you run pstack or gdb -batch -ex "bt" -p <pid> a few times so we know which code path runs in a loop? anything you can tell us about the configuration so we can reproduce it? which version are you using? @moticless PTAL.

Comment From: moticless

@sedflix , According to INFO you are using rather old version (6.0.7). Can you try newer version?

@oranagra made some fix that might be related to your issue: Fix busy loop in ae.c when timer event is about to fire (https://github.com/redis/redis/pull/8764)

Comment From: betermieux

Due to a reboot the sentinel was restarted and is currently not using 100% cpu any more. I will have to wait until it happens again. I am using docker image redis:7.0.4-alpine3.16. Since the containers are running on a locked down Fedora CoreOS machine, I will have to debug them with a parallel container running gdb, but this should pose no problem.

Comment From: yevhen-harmonizehr

Faced same issue today on 3-node cluster. Restart of single pod didn't help. Only scale to 0 and scale back to 3 nodes solved the issue.

Comment From: oranagra

@yevhen-harmonizehr please specify which version you're using.

Comment From: yevhen-harmonizehr

@yevhen-harmonizehr please specify which version you're using.

redis:7.0.5-alpine

Comment From: oranagra

so it can't be the above mentioned fix, which was part of 6.2. if anyone reproduces this again (regardless of the version), we'd love to get some info. see the instructions i posted here https://github.com/redis/redis/issues/9956#issuecomment-1304445937

Comment From: kfirfer

Its reproducing with this helm chart:

https://artifacthub.io/packages/helm/dandydev-charts/redis-ha

Tried multiple versions of redis ( 6.X + 7.X )

Its doesn't reproducing all the times around 1 of every 3 times its reproduces

Some info:

 kubectl top po --containers=true | grep redis                                                                                                                                                             


redis-ha-haproxy-679fcbf55b-6bgxj        haproxy                   4m           69Mi
redis-ha-haproxy-679fcbf55b-hjrf4        haproxy                   3m           69Mi
redis-ha-haproxy-679fcbf55b-jcgk6        haproxy                   3m           69Mi
redis-ha-server-0                        redis                     5m           2Mi
redis-ha-server-0                        sentinel                  7m           2Mi
redis-ha-server-0                        split-brain-fix           0m           0Mi
redis-ha-server-1                        redis                     5m           2Mi
redis-ha-server-1                        sentinel                  8m           2Mi
redis-ha-server-1                        split-brain-fix           0m           0Mi
redis-ha-server-2                        redis                     2m           3Mi
redis-ha-server-2                        sentinel                  955m         3Mi
redis-ha-server-2                        split-brain-fix           0m           0Mi

Inside the sentinel container:

Mem: 21717868K used, 1839616K free, 441888K shrd, 635924K buff, 9427004K cached
CPU:  25% usr  23% sys   0% nic  50% idle   0% io   0% irq   0% sirq
Load average: 2.40 6.70 9.04 5/4816 741
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 1000     R    33452   0%   1  24% redis-sentinel *:26379 [sentinel]
  456     0 1000     S     1784   0%   1   0% sh
  464   456 1000     R     1692   0%   2   0% top

Its 24% cause the vm machine is 4 cores (so its 100% of 1 core)

lsof sentinel container:

/data $ lsof
1       /usr/local/bin/redis-server     0       /dev/null
1       /usr/local/bin/redis-server     1       pipe:[29328910]
1       /usr/local/bin/redis-server     2       pipe:[29328911]
1       /usr/local/bin/redis-server     3       pipe:[29328991]
1       /usr/local/bin/redis-server     4       pipe:[29328991]
1       /usr/local/bin/redis-server     5       anon_inode:[eventpoll]
1       /usr/local/bin/redis-server     6       socket:[29328992]
1       /usr/local/bin/redis-server     7       socket:[29328993]
1       /usr/local/bin/redis-server     8       socket:[29328998]
1       /usr/local/bin/redis-server     9       socket:[29328999]
1       /usr/local/bin/redis-server     10      socket:[29327138]
1       /usr/local/bin/redis-server     11      socket:[29327139]
1       /usr/local/bin/redis-server     12      socket:[29327140]
1       /usr/local/bin/redis-server     13      socket:[29327141]
1       /usr/local/bin/redis-server     14      socket:[29327142]
1       /usr/local/bin/redis-server     17      socket:[29330538]
1       /usr/local/bin/redis-server     18      socket:[29596831]
1       /usr/local/bin/redis-server     19      socket:[29596847]
1       /usr/local/bin/redis-server     20      socket:[29330576]
1       /usr/local/bin/redis-server     21      socket:[29330578]
456     /bin/busybox    0       /dev/pts/0
456     /bin/busybox    1       /dev/pts/0
456     /bin/busybox    2       /dev/pts/0
456     /bin/busybox    10      /dev/tty

Comment From: oranagra

i'd like a stack trace to see what it's doing. you can maybe do that with the instructions here https://github.com/redis/redis/issues/9956#issuecomment-1304445937 or if you're willing to kill the process, then kill -11 <pid> will cause it to print the trace to the log file. the first method is preferred since you can take several samples.

Comment From: kfirfer

@oranagra

/tmp # gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
0x00007fc15ceb44a3 in ?? () from /lib/ld-musl-x86_64.so.1
#0  0x00007fc15ceb44a3 in ?? () from /lib/ld-musl-x86_64.so.1
#1  0x00007fc15ceb16fa in ?? () from /lib/ld-musl-x86_64.so.1
#2  0x00007fc15cb5b914 in ?? ()
#3  0x0000000000000008 in ?? ()
#4  0x0000000012b82ba2 in ?? ()
#5  0x0000000000000000 in ?? ()
[Inferior 1 (process 1) detached]

Running as 'root' and he doesnt shows the file and line number Idk why

Comment From: kfirfer

@oranagra

Tried with bullseye docker image and he gives me more info:

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=100) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=100) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=68) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=68) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

The Stacktraces running on sentinel that uses high cpu for no reason

Comment From: kfirfer

Running multiple times:

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
gdb -batch -ex "bt" -p 1__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# 
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:362
362     ae.c: No such file or directory.
#0  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:362
#1  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#2  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=59) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=59) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00005569536b9cd1 in getMonotonicUs_posix () at monotonic.c:141
141     monotonic.c: No such file or directory.
#0  0x00005569536b9cd1 in getMonotonicUs_posix () at monotonic.c:141
#1  0x00005569535eab33 in usUntilEarliestTimer (eventLoop=0x7f71fd82b1e0, eventLoop=0x7f71fd82b1e0) at ae.c:271
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:375
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=99) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=99) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=40) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=40) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=49) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=49) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=89) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=89) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

Comment From: oranagra

thanks. this is useful.

@michael-grunder can you take a look? any reason why redisNetRead will eat CPU, or do a blocking read (following an epoll indication that it's readable)?

Comment From: tillkruss

@michael-grunder can you take a look? any reason why redisNetRead will eat CPU, or do a blocking read (following an epoll indication that it's readable)?

No, he’s busy fixing macOS bugs.

Comment From: michael-grunder

any reason why redisNetRead will eat CPU, or do a blocking read (following an epoll indication that it's readable)

From the call stack, it seems like maybe it's getting into a loop where recv immediately fails with EAGAIN, so it just spins eating CPU.

I'll take a closer look tomorrow.

Edit: It would also be interesting to see if there was anything of note in the sentinel logs.

Comment From: kfirfer

@michael-grunder

Logs of sentinel that using 100% CPU:

$ kubectl logs redis-ha-server-2 sentinel                              
1:X 29 Nov 2022 19:38:35.509 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 29 Nov 2022 19:38:35.509 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 29 Nov 2022 19:38:35.509 # Configuration loaded
1:X 29 Nov 2022 19:38:35.510 * monotonic clock: POSIX clock_gettime
1:X 29 Nov 2022 19:38:35.511 * Running mode=sentinel, port=26379.
1:X 29 Nov 2022 19:38:35.511 # Sentinel ID is ce2a8c84e9e96e3cf079fafb2ad20bcde4a7b533
1:X 29 Nov 2022 19:38:35.511 # +monitor master mymaster 10.48.8.79 6379 quorum 2
1:X 29 Nov 2022 19:38:35.513 * +slave slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:35.521 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.522 * +slave slave 10.48.12.117:6379 10.48.12.117 6379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:35.527 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.854 * +sentinel sentinel a0ddf4367a72f8e768e2546287bf0b303985c19f 10.48.6.161 26379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:35.860 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.866 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.866 # +new-epoch 10
1:X 29 Nov 2022 19:38:36.637 * +sentinel sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:36.645 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:39:52.343 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:39:52.343 # +new-epoch 11
1:X 29 Nov 2022 19:39:53.281 # +config-update-from sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:39:53.281 # +switch-master mymaster 10.48.8.79 6379 10.48.12.117 6379
1:X 29 Nov 2022 19:39:53.337 * +slave slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:39:53.337 * +slave slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:39:53.343 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:40:07.245 # +sdown sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:40:14.554 # +sdown slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:41:08.813 * +reboot slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:41:08.911 # -sdown slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:41:11.977 # -sdown sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:42:19.449 # +sdown slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:42:24.029 # +sdown sentinel a0ddf4367a72f8e768e2546287bf0b303985c19f 10.48.6.161 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:43:24.637 # -sdown sentinel a0ddf4367a72f8e768e2546287bf0b303985c19f 10.48.6.161 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:43:25.051 * +reboot slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.12.117 6379

Comment From: yevhen-harmonizehr

can see same lines for gdb

#9  0x000056375716c6fc in beforeSleep (eventLoop=<optimized out>) at server.c:1648
#10 0x00005637571680c1 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f689822b1e0) at ae.c:113
#11 aeProcessEvents (eventLoop=eventLoop@entry=0x7f689822b1e0, flags=flags@entry=27) at ae.c:404
#12 0x0000563757168c5d in aeProcessEvents (flags=27, eventLoop=0x7f689822b1e0) at ae.c:496
#13 aeMain (eventLoop=0x7f689822b1e0) at ae.c:496
#14 0x000056375716450a in main (argc=<optimized out>, argv=0x7ffdab146bf8) at server.c:7075

and for sentinel logs

k logs -f argo-cd-redis-ha-server-1 -c sentinel
1:X 06 Dec 2022 20:14:13.642 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 06 Dec 2022 20:14:13.642 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 06 Dec 2022 20:14:13.642 # Configuration loaded
1:X 06 Dec 2022 20:14:13.643 * monotonic clock: POSIX clock_gettime
1:X 06 Dec 2022 20:14:13.643 * Running mode=sentinel, port=26379.
1:X 06 Dec 2022 20:14:13.644 # Sentinel ID is c524e76c625cc68c94f25c99e20efce31df6f4c2
1:X 06 Dec 2022 20:14:13.644 # +monitor master argocd 172.20.173.199 6379 quorum 2
1:X 06 Dec 2022 20:14:13.646 * +slave slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:13.651 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:14:13.651 * +slave slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:13.656 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:14:13.939 * +sentinel sentinel 2f679f20284b54ed430b97819302f9dc0d48cb09 172.20.112.236 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:13.945 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:14:14.099 * +sentinel sentinel 4794d0beec6abdd90a3821f7d9bf66804f290941 172.20.173.199 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:14.104 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:18:07.337 # +sdown sentinel 2f679f20284b54ed430b97819302f9dc0d48cb09 172.20.112.236 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:18:07.798 # +sdown slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:19:04.549 * +reboot slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:19:04.593 # -sdown slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:19:05.762 # -sdown sentinel 2f679f20284b54ed430b97819302f9dc0d48cb09 172.20.112.236 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:26.004 # +sdown sentinel 4794d0beec6abdd90a3821f7d9bf66804f290941 172.20.173.199 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.601 # +sdown master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.659 # +odown master argocd 172.20.173.199 6379 #quorum 2/2
1:X 06 Dec 2022 20:24:28.659 # +new-epoch 1
1:X 06 Dec 2022 20:24:28.659 # +try-failover master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.763 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:24:28.764 # +vote-for-leader c524e76c625cc68c94f25c99e20efce31df6f4c2 1
1:X 06 Dec 2022 20:24:28.776 # 2f679f20284b54ed430b97819302f9dc0d48cb09 voted for c524e76c625cc68c94f25c99e20efce31df6f4c2 1
1:X 06 Dec 2022 20:24:28.822 # +elected-leader master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.822 # +failover-state-select-slave master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.884 # +selected-slave slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.884 * +failover-state-send-slaveof-noone slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.955 * +failover-state-wait-promotion slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:29.940 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:24:29.940 # +promoted-slave slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:29.940 # +failover-state-reconf-slaves master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:29.993 * +slave-reconf-sent slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.420 * +slave-reconf-inprog slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.420 * +slave-reconf-done slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.505 # +failover-end master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.505 # +switch-master argocd 172.20.173.199 6379 172.20.150.209 6379
1:X 06 Dec 2022 20:24:30.505 * +slave slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:24:30.505 * +slave slave 172.20.173.199:6379 172.20.173.199 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:24:30.511 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:24:40.539 # +sdown slave 172.20.173.199:6379 172.20.173.199 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:25:45.791 # -sdown slave 172.20.173.199:6379 172.20.173.199 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:25:49.641 # -sdown sentinel 4794d0beec6abdd90a3821f7d9bf66804f290941 172.20.173.199 26379 @ argocd 172.20.150.209 6379

Comment From: kfirfer

There some workaround ?

Edit: Its happens only on docker ? maybe incompatibility with the OS?

Comment From: silvpol

This also happens on GKE using ContainerD so not Docker specific. We try to manage this by setting up livenessCheck on Sentinel CPU usage so that it gets restarted automatically if it uses too much CPU.

Comment From: Zebradil

We observe this issue on GKE as well.

Comment From: davidln777

@oranagra - any update on this issue? We keep getting it, and it's being resolved after the pod restart.

Comment From: oranagra

@michael-grunder can you please take a closer look?

Comment From: Sven1410

is there any workaround or hint how to prevent/solve this ? We have the same problem after upgrading redis-ha from 7.0.0 to 7.0.13 (inside of argocd) (I think the problem was there before - but now it occurs in nearly every restart)

what additionally confuses me:

kubectl top pods shows that the (sentinel) container runs on its limit (CPU 1005m) but when I execute top inside the container the result always is CPU usage: 12% (?)

thanks in advance!

edit: looks like the 12% CPU (busybox top command) inside the container reflects the capacity of the node --> the limit is 1000m which is 1/8 cpu's of the node = 12% for a lifeness check the "load" would be a better indicator (also available via command "uptime")

Comment From: Sven1410

maybe a redis-expert can see something here - this is the "info" command output of the "bad" sentinel container with 100% cpu consumption ("sentinel pending-scripts" returns nothing, failover works fine):

redis-cli -p 26379
info

# Server
redis_version:7.0.11
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:183dd15edc8d4d6a
redis_mode:sentinel
os:Linux 5.15.135-gardenlinux-cloud-amd64 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:12.2.1
process_id:1
process_supervised:no
run_id:8986c630a7fd774f791223f327bea3154f794675
tcp_port:26379
server_time_usec:1701163193604078
uptime_in_seconds:66249
uptime_in_days:0
hz:13
configured_hz:10
lru_clock:6664377
executable:/data/redis-sentinel
config_file:/data/conf/sentinel.conf
io_threads_active:0

# Clients
connected_clients:6
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Stats
total_connections_received:340692
total_commands_processed:863856
instantaneous_ops_per_sec:8
total_net_input_bytes:27922297
total_net_output_bytes:23261548
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.29
instantaneous_output_kbps:0.22
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:574
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:198318
dump_payload_sanitizations:0
total_reads_processed:1345332
total_writes_processed:1057540
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:132427
reply_buffer_expands:6

# CPU
used_cpu_sys:40470.884147
used_cpu_user:25442.201681
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:40470.870627
used_cpu_user_main_thread:25442.177889

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=argocd,status=ok,address=100.xx.xxx.38:6379,slaves=2,sentinels=3

sentinel pending-scripts
<--empty line returned-->
sentinel CKQUORUM argocd
OK 3 usable Sentinels. Quorum and failover authorization can be reached

Comment From: Sven1410

I found a workaround (for docker/k8s):

I played a bit with the sentinel CLI and found out that a "sentinel reset *" always solves the problem and the CPU consumption is dropped down to normal. https://lzone.de/cheat-sheet/Redis%20Sentinel https://redis.io/docs/management/sentinel/#sentinel-commands

and so far the problem occurs only after a rolling update or container restarts. So I added a delayed reset to all the sentinel containers via helm chart values:

argo-cd:
  redis-ha:
    sentinel:
      lifecycle: 
        postStart:
          exec:
            command: ["/bin/sh", "-c", "sleep 30; redis-cli -p 26379 sentinel reset argocd "]

Comment From: jortkoopmans

Thank you for the proposed workaround @Sven1410. Did the issue return for you in the meantime over the last few days (as it is intermittent)? What I've done based on your post: - Run the manual sentinel reset command. This indeed worked for me to bring back the utilization to normal. - Implement the lifecycle hook, but while I used 60s sleep in the command, I had at least one occasion where the hook did not prevent the utilization spike. It is possible that timing matters (30s vs 60s, in relation to the entrypoint or other containers in the pod), alternatively it could be reducing the likelihood of running into this condition (which probably is a race condition?) but not preventing it. More/longer testing should give clearer results, specifically on this workaround. I'll report back if I obtain robust conclusions.

A root cause analysis is (still) preferred though :blush: .

Comment From: Sven1410

so far the problem did not came back. I would also think a longer delay would be better, but if the delay is too long our alert already fires. Another option is to do this reset periodically (once a day maybe?) my versions: (redis-ha: 7.0.13-alpine, haproxy: 2.8.3-alpine)

Thanks for testing + sharing the results!

Comment From: jortkoopmans

Unfortunately, the workaround does not (consistently) work on my deployments. I have 2 sets of versions and configurations, I still observe the cpu drain, at least on the 6.2.5 version. * redis 7.0.7-alpine, ha-proxy 2.6.4 * redis 6.2.5-alpine, no ha-proxy

Note that I do not observe the sentinel pending-scripts line, described here

Comment From: Sven1410

"sentinel pending-scripts" was already the next command I typed in - it returns nothing, only an empty line...

Comment From: jortkoopmans

"sentinel pending-scripts" was already the next command I typed in - it returns nothing, only an empty line...

Ah OK my bad, should have checked the sentinel cli directives. I'll verify this the next time I observe the issue. In normal condition I get this response:

(empty array)

Comment From: silvpol

We have experienced this in prod today and sentinel pending-scripts did return an empty array.

Comment From: jortkoopmans

Following up after observing this for a while (on a number of independent redis-ha deployments), findings: - The workaround to reset the sentinel after pod start (on k8s) does not work in my case. I don't have long term statistics on the bug likelihood on specific configuration combinations, but in all cases it does not alleviate the issue substantially enough to be able to run this independently in production (i.e. other mitigation required). - sentinel pending-scripts still returns an empty array (same to normal conditions, identical to finding by @silvpol ).

Now I'll stop posting behavior patterns for this issue, in the hope of a root cause analysis. :pray:

Comment From: Sven1410

After debugging with gdb I've got exactly the same stack traces like others posted here. I think it must be this epoll_wait() call. But the timeout (-1) looks correct. If you google "epoll_wait" "100% cpu" you will find dozens of nearly identical problems. Maybe we have a similar problem like described here: https://github.com/joyent/libuv/issues/1099 The fact that we restart the redis process(es) on the same (linux) node and use the same sockets - especially the "announce" sockets could also be relevant.

Is there an option to fall back (for linux docker images) to "ae_kqueue.c"? (#define HAVE_EPOLL 0)?

Comment From: posh-raj-kuni

We are seeing the same issue in our environment. Our environment is a GKE cluster running on k8s version 1.26.

The perplexing thing is, this CPU issue is only happening in one namespace in the cluster. If we try deploying redis-sentinel in another namespace, we are not seeing the issue!

When I read it may be a networking issue, I thought maybe the network policies in the namespace where the CPU ramp up was observed were the issue. So I added a couple policies that allowed all egress/ingress traffic. But that didn't resolve the issue either.

So still stumped as to what's causing this issue and why it's only happening in one namespace!

Comment From: posh-raj-kuni

We figured out what the issue was on our end.

When I monitored the processes on the redis and sentinel containers, I noticed that the processes that get created for the health checks (liveness, readiness probes) were taking up a lot of CPU (around 40% - 50%). This didn't seem right. In the namespaces where the overall CPU usage was low, these health check processes took up a negligible amount of CPU.

One key difference between the two namespaces was that the pod in the namespace where we saw high CPU usage had a very large number of environment variables set by k8s service links.

When we disabled the service links, we no longer observed the high cpu usage! It looks like the large number of environment variables causes the liveness/readiness probes to take up a lot of cpu.

But then again, I'm not sure if the high number of environment variables is the root cause of the CPU usage. Because we have other redis bitnami deployments running in standalone mode where the same health checks are run. And we don't see the health checks using up the CPU like in the redis-sentinel pods.

Perhaps a combination of having a large number of environment variables and multiple containers in a pod is causing the health check processes to take up a lot of CPU?

I also found an article discussing how exec probes can result in large cpu usage but not sure if it is related to the health check problems in redis-sentinel.

And another github thread discussing the same high CPU issue for exec probes.

Comment From: denniskorbginski

I encountered the same issue today with redis 7.2.4-alpine. Anything we can do help find the root cause?

Comment From: michael-grunder

A container based reproducer (e.g. docker-compose, docker, etc) would work :smile: I don't personally use sentinel in anywhere so have never encountered the state.

The fix is likely simple. The hard part is reproducing the issue.

Comment From: denniskorbginski

I'll try 😅. While this happens frequently in my production workloads, I was not yet able to find a clue to the underlying event that triggered this behavior. Reproducing it in a sandbox environment failed so far as well, but I'll keep trying.

Comment From: enisn

I use it in Coolify and got the same issue

Comment From: Sven1410

Here is a way to reproduce the problem - worked locally on my windows notebook (should also work on Linux & macOS). (I spent approx. ~1h to reproduce the issue)

-install "minikube" https://minikube.sigs.k8s.io/docs/ -start k8s:

minikube start

-install metrics (this enables the "kubectl top pods" command):

minikube addons enable metrics-server

-install "kubectl" and connect it to minikube or do it directly with "minikube" command https://kubernetes.io/docs/tasks/tools/#kubectl , https://kubernetes.io/docs/reference/kubectl/ or https://minikube.sigs.k8s.io/docs/handbook/kubectl/

-check that "kubectl" works and is connected to the minikube:

$ kubectl get pods -A

NAMESPACE     NAME                               READY   STATUS    RESTARTS         AGE
kube-system   coredns-565d847f94-grcq9           1/1     Running   0                38m
kube-system   etcd-minikube                      1/1     Running   0                38m
kube-system   kube-apiserver-minikube            1/1     Running   0                38m    ...

-install "helm" command https://helm.sh/docs/intro/install/

-install/deploy redis-ha helm chart:

helm repo add dandydev https://dandydeveloper.github.io/charts
helm install redis-ha dandydev/redis-ha --set hardAntiAffinity=false

-after this you will found 3 pods running redis and sentinel containers


$ kubectl get pods -A

NAMESPACE     NAME                               READY   STATUS    RESTARTS         AGE
default       redis-ha-server-0                  3/3     Running   0                9m41s
default       redis-ha-server-1                  3/3     Running   0                8m24s
default       redis-ha-server-2                  3/3     Running   0                7m8s
kube-system   coredns-565d847f94-grcq9           1/1     Running   0                38m
kube-system   etcd-minikube                      1/1     Running   0                38m
kube-system   kube-apiserver-minikube            1/1     Running   0                38m
...


$ kubectl top pods --containers
POD                 NAME              CPU(cores)   MEMORY(bytes)
redis-ha-server-0   redis             21m          2Mi
redis-ha-server-0   sentinel          21m          2Mi
redis-ha-server-0   split-brain-fix   1m           0Mi
redis-ha-server-1   redis             21m          2Mi
redis-ha-server-1   sentinel          22m          2Mi
redis-ha-server-1   split-brain-fix   1m           0Mi
redis-ha-server-2   redis             21m          2Mi
redis-ha-server-2   sentinel          21m          2Mi
redis-ha-server-2   split-brain-fix   1m           0Mi

-now do 1 or more restarts of the 3 pods (in my case 1 restart was sufficient to reproduce the problem)

$ kubectl rollout restart sts/redis-ha-server
statefulset.apps/redis-ha-server restarted

-wait until restart is done (check with "kubectl get pods -A") -check with "kubectl top pods --containers" if CPU consuption is too high (near 1)

$ kubectl top pods --containers
POD                 NAME              CPU(cores)   MEMORY(bytes)
redis-ha-server-0   redis             10m          3Mi
redis-ha-server-0   sentinel          13m          2Mi
redis-ha-server-0   split-brain-fix   3m           0Mi
redis-ha-server-1   redis             13m          2Mi
redis-ha-server-1   sentinel          24m          2Mi
redis-ha-server-1   split-brain-fix   1m           0Mi
redis-ha-server-2   redis             21m          3Mi
redis-ha-server-2   sentinel          985m         2Mi  <----- here is the problem container, see "CPU"
redis-ha-server-2   split-brain-fix   1m           0Mi

-you can now connect to the container and run linux commands "top","ps" etc. with e.g.

$ kubectl exec -i redis-ha-server-2  --container sentinel sh

>top
Mem: 2229800K used, 3695920K free, 720612K shrd, 30620K buff, 1389952K cached
CPU:  25% usr  25% sys   0% nic  35% idle  15% io   0% irq   0% sirq
Load average: 2.51 1.59 0.78 4/602 581
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 1000     R    32576   1%   1  50% redis-sentinel *:26379 [sentinel]  <----- here is the problem process
  574     0 1000     S     1628   0%   0   0% sh
  581   574 1000     R     1624   0%   0   0% top

-at this point also the fan of my laptop indicated the 100% CPU usage...

Comment From: outime

Any update on this?

Comment From: denniskorbginski

I never had any success reproducing this, but I am also not seeing this problem in my setup anymore. I can't explain why my changes helped, or if they are related at all, but here's what I did. Hope this helps you guys as well.

I'm also running the dandydev/redis-ha helm chart and I noticed a problem with failover. When a pod was terminated, the preStop-Hook in the redis container triggered the failover. The sentinel running in the same pod was elected as leader for this failover, but it was killed by k8s as well and never had the change to actually finish the failover procedure. I then added a preStop-hook to the sentinel container with essentially a sleep 20. The failover issues were resolved, and somehow, I have never seen the excessive CPU usage for any redis deployment in my clusters again. ¯\_(ツ)_\/¯

Comment From: bootc

I just tried your workaround @denniskorbginski and I'm afraid it doesn't work for me.

I added the following in my redis-ha config:

sentinel:
  lifecycle:
    preStop:
      exec:
        command: [/bin/sh, -c, sleep 20]

I verified that the sentinel containers indeed have the preStop hook, and I'm still seeing my sentinels regularly spin at 100% CPU.

Has anyone else tried this? What were your results?

Comment From: Sven1410

I've tested it with sleep 25 and sleep 120 in the minikube -->still 100% CPU usage after the first rolling restart.

Comment From: kfirfer

As a temporary ugly workaround, I have set a CPU resource limit of 200m for the sentinel container. This prevents the entire node(s) from being on fire.

Comment From: hronix

experience the same problem with redis:7.2.4-alpine

Comment From: tstollin

We are experiecing the same issue with our redis-ha helm charts. Neither the postStart hook executing the sentinel reset (tested with different sleeps from 30 to 300 s) nor the preStop hook have fixed the issue for us.

The only solution that worked so far is manually executing the sentinel reset in each affected container. But obviously this is not viable in a production environment.

My guess is that the rolling deployments of the stateful set are the reason for the issue for the other already running containers. Hence, It would require executing the reset after all containers are ready. This is why the postStart hook won't work: a container is marked ready AFTER the postStart hook executed and the rolling deployment only continues with the next pod if the previous pod is ready. Hence, the sleep will only delay the rolling deployment and the following container starts may create the issue for the resetted container.

Since there are no "after deployment" hooks in k8s, maybe a CronJob executing the reset using kubectl could be a temporary fix.

For now, we are following the idea proposed by @kfirfer and set a cpu limit-

Comment From: harry91

We encountered the same problem and get the same gdb output as @kfirfer pasted above.

Comment From: pfaelzerchen

I also encountered the same problem, but only with the redis-ha instance that was deployed by the argo-cd helm chart (which uses dandydev/redis-ha). My deployments made with the bitnami redis chart do not encounter this problem.

Maybe this can be of any help. I am a newbie and do not really know where I would have to look into when comparing both deployments.

Comment From: nomarek

From my observations, this problem only occurs on dandydev/redis-ha, but not on bitnami/redis. The dandydev chart uses a Redis distribution based on Alpine Linux, while Bitnami uses Debian. I suspect that this problem is strictly related to musl libc on Alpine. The comments in this issue seem to confirm this, as practically everyone here mentions Alpine.

Comment From: bootc

Sorry @nomarek but this definitely also occurs in the Debian-based Redis images. I also had this theory a while back and switched to Debian-based images using the chart just to have the problem continue to occur. It also happens when using Valkey instead of Redis.