Our application uses Lua EVAL/EVALSHA to do most heavy lifting of logic that requires atomic database changes. During normal operation, Redis has a seemingly random chance to hang during the (only) SRANDMEMBER call of our Lua code when performing a background save (using either RDB or AOF) where it doesn't recover from even after the background save finishes. We're unable to determine a solid reproduction case at this point since we can't pinpoint exactly what's causing it yet but across multiple GDB backtraces (below) we believe that we were able to figure out that when Redis is hanging during the command call, it's always stuck in this while loop, although we're not 100% sure about that.

The version of Redis we're using is 7.0.11 running inside Docker with this configuration:

activedefrag yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 50
active-defrag-cycle-min 1
active-defrag-cycle-max 25
active-defrag-max-scan-fields 10000
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush yes
lazyfree-lazy-user-del yes
appendonly yes
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 1gb
client-output-buffer-limit slave 10737418240 1073741824 300
bind 0.0.0.0
port ${PORT}
requirepass ${PASS}
save ''
lua-time-limit 5000
busy-reply-threshold 5000

Note that while we have AOF configured here with RDB being disabled, we also tried running this Redis server with AOF disabled and RDB set to snapshot every 5 minutes after 1 change. We also tried explicitly setting lua-time-limit and busy-reply-threshold in the startup configuration as well as setting CONFIG SET watchdog-period 1000 during runtime but none of those has produced any usable logs at the time Redis became unresponsive.

Logs indicate that the hangs always happen during a background save:

AOF rewrite:

1:M 18 May 2023 12:46:34.929 * Starting automatic rewriting of AOF on 100% growth                                                                                          
1:M 18 May 2023 12:46:34.930 * Creating AOF incr file appendonly.aof.4377.incr.aof on background rewrite
1:M 18 May 2023 12:46:35.164 * Background append only file rewriting started by pid 59
<Redis became unresponsive at 12:47:23>
59:C 18 May 2023 12:47:37.119 * Successfully created the temporary AOF base file temp-rewriteaof-bg-59.aof
59:C 18 May 2023 12:47:37.260 * Fork CoW for AOF rewrite: current 1572 MB, peak 1572 MB, average 818 MB

AOF rewrite:

1:M 19 May 2023 02:08:38.075 * Starting automatic rewriting of AOF on 100% growth
1:M 19 May 2023 02:08:38.076 * Creating AOF incr file appendonly.aof.4385.incr.aof on background rewrite
1:M 19 May 2023 02:08:38.345 * Background append only file rewriting started by pid 25
<Redis became unresponsive at 02:08:50>
25:C 19 May 2023 02:10:04.373 * Successfully created the temporary AOF base file temp-rewriteaof-bg-25.aof
25:C 19 May 2023 02:10:04.591 * Fork CoW for AOF rewrite: current 605 MB, peak 605 MB, average 329 MB

RDB save:

1:M 19 May 2023 04:25:55.055 * 1 changes in 300 seconds. Saving...
1:M 19 May 2023 04:25:55.374 * Background saving started by pid 28
<Redis became unresponsive at 04:25:58>
28:C 19 May 2023 04:27:20.861 * DB saved on disk
28:C 19 May 2023 04:27:21.083 * Fork CoW for RDB: current 142 MB, peak 142 MB, average 96 MB
2023-05-18T13:02:50+00:00
#0  _dictKeyIndex (existing=0x0, hash=4486456289374595091, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1054
#1  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#2  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#3  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#4  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7  0x0000558d683b9e88 in luaD_precall ()
#8  0x0000558d683c4110 in luaV_execute ()
#9  0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:02:56+00:00
#0  0x00007f56c792531f in random () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f56c7925799 in rand () from target:/lib/x86_64-linux-gnu/libc.so.6
#2  0x0000558d682d3764 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:797
#3  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#4  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#5  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#6  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#7  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#8  0x0000558d683b9e88 in luaD_precall ()
#9  0x0000558d683c4110 in luaV_execute ()
#10 0x0000558d683ba525 in luaD_call ()
#11 0x0000558d683b9858 in luaD_rawrunprotected ()
#12 0x0000558d683ba6d0 in luaD_pcall ()
#13 0x0000558d683b79f8 in lua_pcall ()
#14 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#15 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#16 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#17 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#18 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#19 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#20 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#21 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#22 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#23 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#24 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#25 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:00+00:00
#0  0x0000558d682d4abe in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:244
#1  0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=15826992378078849043, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#2  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#3  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#4  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#5  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#6  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#7  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#8  0x0000558d683b9e88 in luaD_precall ()
#9  0x0000558d683c4110 in luaV_execute ()
#10 0x0000558d683ba525 in luaD_call ()
#11 0x0000558d683b9858 in luaD_rawrunprotected ()
#12 0x0000558d683ba6d0 in luaD_pcall ()
#13 0x0000558d683b79f8 in lua_pcall ()
#14 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#15 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#16 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#17 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#18 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#19 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#20 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#21 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#22 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#23 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#24 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#25 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:05+00:00
#0  0x0000558d682e30b9 in sdsfree (s=<optimized out>) at sds.c:196
#1  0x0000558d68318938 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:792
#2  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#3  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#4  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#5  0x0000558d683b9e88 in luaD_precall ()
#6  0x0000558d683c4110 in luaV_execute ()
#7  0x0000558d683ba525 in luaD_call ()
#8  0x0000558d683b9858 in luaD_rawrunprotected ()
#9  0x0000558d683ba6d0 in luaD_pcall ()
#10 0x0000558d683b79f8 in lua_pcall ()
#11 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#12 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#13 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#14 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#15 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#16 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#17 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#18 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#19 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#20 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#21 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#22 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:10+00:00
#0  dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=0x7ffc86daaac0, des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:764
#1  0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7  0x0000558d683b9e88 in luaD_precall ()
#8  0x0000558d683c4110 in luaV_execute ()
#9  0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:16+00:00
#0  sz_index2size_lookup (index=1) at include/jemalloc/internal/sz.h:199
#1  sz_index2size (index=1) at include/jemalloc/internal/sz.h:207
#2  arena_salloc (ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:195
#3  isalloc (ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:38
#4  je_malloc_usable_size (ptr=ptr@entry=0x7f56906063c0) at src/jemalloc.c:3740
#5  0x0000558d682e69fe in zfree (ptr=0x7f56906063c0) at zmalloc.c:372
#6  zfree (ptr=0x7f56906063c0) at zmalloc.c:364
#7  0x0000558d68318938 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:792
#8  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#9  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#10 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#11 0x0000558d683b9e88 in luaD_precall ()
#12 0x0000558d683c4110 in luaV_execute ()
#13 0x0000558d683ba525 in luaD_call ()
#14 0x0000558d683b9858 in luaD_rawrunprotected ()
#15 0x0000558d683ba6d0 in luaD_pcall ()
#16 0x0000558d683b79f8 in lua_pcall ()
#17 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#18 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#19 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#20 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#21 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#22 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#23 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#24 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#25 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#26 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#27 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#28 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:20+00:00
#0  0x00007f56c7a42a31 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2  0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=16039903783278520339, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9  0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:24+00:00
#0  dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:738
#1  0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7  0x0000558d683b9e88 in luaD_precall ()
#8  0x0000558d683c4110 in luaV_execute ()
#9  0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:29+00:00
#0  dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=0x7ffc86daab00, des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:733
#1  0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7  0x0000558d683b9e88 in luaD_precall ()
#8  0x0000558d683c4110 in luaV_execute ()
#9  0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:34+00:00
#0  0x0000558d682d27be in dictTypeExpandAllowed (d=0x7f549517e318) at dict.c:989
#1  _dictExpandIfNeeded (d=0x7f549517e318) at dict.c:1008
#2  _dictKeyIndex (existing=0x0, hash=4334487920869140499, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1048
#3  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9  0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:38+00:00
#0  0x0000558d682d4a7d in sdslen (s=0x7f56909d2311 "i:blZ7htj") at sds.h:89
#1  dictSdsKeyCompare (d=<optimized out>, key1=0x7f56906063c1, key2=0x7f56909d2311) at server.c:241
#2  0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=2834940632465860627, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9  0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:44+00:00
#0  dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=0x7ffc86daaac0, des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:768
#1  0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7  0x0000558d683b9e88 in luaD_precall ()
#8  0x0000558d683c4110 in luaV_execute ()
#9  0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:03:51+00:00
#0  0x00007f56c7a42a28 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2  0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=1090945667569254371, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9  0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:04:13+00:00
#0  zfree (ptr=0x7f56906063c0) at zmalloc.c:373
#1  zfree (ptr=0x7f56906063c0) at zmalloc.c:364
#2  0x0000558d68318938 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:792
#3  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#4  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#5  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#6  0x0000558d683b9e88 in luaD_precall ()
#7  0x0000558d683c4110 in luaV_execute ()
#8  0x0000558d683ba525 in luaD_call ()
#9  0x0000558d683b9858 in luaD_rawrunprotected ()
#10 0x0000558d683ba6d0 in luaD_pcall ()
#11 0x0000558d683b79f8 in lua_pcall ()
#12 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#13 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#14 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#15 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#16 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#17 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#18 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#19 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#20 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#21 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#22 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#23 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:04:17+00:00
#0  0x0000558d682d35d9 in dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:755
#1  0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7  0x0000558d683b9e88 in luaD_precall ()
#8  0x0000558d683c4110 in luaV_execute ()
#9  0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:04:23+00:00
#0  0x00007f56c7a42a24 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2  0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=17057246290930548707, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9  0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:04:27+00:00
#0  0x00007f56c7925330 in random () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f56c7925799 in rand () from target:/lib/x86_64-linux-gnu/libc.so.6
#2  0x0000558d682d3764 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:797
#3  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#4  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#5  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#6  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#7  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#8  0x0000558d683b9e88 in luaD_precall ()
#9  0x0000558d683c4110 in luaV_execute ()
#10 0x0000558d683ba525 in luaD_call ()
#11 0x0000558d683b9858 in luaD_rawrunprotected ()
#12 0x0000558d683ba6d0 in luaD_pcall ()
#13 0x0000558d683b79f8 in lua_pcall ()
#14 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#15 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#16 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#17 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#18 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#19 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#20 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#21 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#22 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#23 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#24 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#25 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:04:32+00:00
#0  0x00007f56c7925712 in random_r () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f56c79252dd in random () from target:/lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f56c7925799 in rand () from target:/lib/x86_64-linux-gnu/libc.so.6
#3  0x0000558d682d3764 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:797
#4  0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#5  0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#6  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9  0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:21:03+00:00
#0  0x00007f56c7a42a28 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2  0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=18240379238795239395, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3  dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4  0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5  0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9  0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

2023-05-18T13:32:52+00:00
#0  extend_to_usable (ptr=ptr@entry=0x7f56906063c0, size=size@entry=16) at zmalloc.c:107
#1  0x0000558d682e6320 in zmalloc_usable (size=size@entry=11, usable=usable@entry=0x7ffc86daab18) at zmalloc.c:162
#2  0x0000558d682e2dc3 in _sdsnewlen (init=0x7f56004947b1, initlen=9, trymalloc=<optimized out>) at sds.c:117
#3  0x0000558d6831884a in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:784
#4  0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5  0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6  0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7  0x0000558d683b9e88 in luaD_precall ()
#8  0x0000558d683c4110 in luaV_execute ()
#9  0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168

Here is also the output of ltrace -ffs256 -p<pid> of the hanging Redis process.

Comment From: sundb

@Fusl Can you provide the parameters of the srandmember command and the length of this key? My guess is that the dictAdd() is abnormally slow because the rehash of the dictionary is suspended during the rdb saving or aof rewrite.

Comment From: Fusl

I can't exactly tell the parameters of the srandmember call since most of the decision logic happens during runtime in the Lua script but from a quick monitor I can see that we're limiting the call to a max count of 30 at this time, however, the count can be lower than that as well:

1684496276.975048 [0 lua] "srandmember" "imgur:todo" "30"
1684496276.980532 [0 lua] "srandmember" "imgur:todo" "10"
1684496276.981337 [0 lua] "srandmember" "imgur:todo" "27"
1684496276.982097 [0 lua] "srandmember" "imgur:todo" "3" 
1684496276.995408 [0 lua] "srandmember" "imgur:todo" "15"
1684496276.996122 [0 lua] "srandmember" "imgur:todo" "7" 
1684496276.996658 [0 lua] "srandmember" "imgur:todo" "11"
1684496276.997268 [0 lua] "srandmember" "imgur:todo" "2" 
1684496277.008314 [0 lua] "srandmember" "imgur:todo" "27"
1684496277.012712 [0 lua] "srandmember" "imgur:todo" "13"
1684496277.036401 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.065458 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.070976 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.071830 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.073522 [0 lua] "srandmember" "imgur:todo" "22"
1684496277.081004 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.082138 [0 lua] "srandmember" "imgur:todo" "1" 
1684496277.093768 [0 lua] "srandmember" "imgur:todo" "29"
1684496277.103692 [0 lua] "srandmember" "imgur:todo" "25"
1684496277.118868 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.125603 [0 lua] "srandmember" "imgur:todo" "18"
1684496277.134662 [0 lua] "srandmember" "imgur:todo" "28"
1684496277.145458 [0 lua] "srandmember" "imgur:todo" "27"
1684496277.149010 [0 lua] "srandmember" "imgur:todo" "12"
1684496277.156520 [0 lua] "srandmember" "imgur:todo" "13"
1684496277.177948 [0 lua] "srandmember" "imgur:todo" "30"

The current scard of this key is 5747349. Do note though that the Lua script that executes the srandmember also immediately afterwards does a srem after making some further decisions on the set member string, so it's quite possible that this set was a little larger than reporting this bug initially, but still well below 10M entries.

Comment From: sundb

@Fusl Thx, Can you also provide the output of INFO ALL and SLOWLOG GET? Maybe also include lua scripts if possible.

Comment From: Fusl

info all slowlog get script.lua

Comment From: Fusl

Note that we're currently running Redis without activedefrag yes to see if that's maybe causing issues for us and it's been chugging along since 6 hours again. However, that also doesn't mean a lot since Redis has been happy for a few days before suddenly not being happy anymore and hanging multiple times within an hour after restarting the container.

Comment From: oranagra

@sundb your theory is interesting. It means that srandmember isn't hand, it just becomes slower during fork, and then when run repeatedly it looks like it's hand. This rehashing prevention is something that was changed in one of the recent versions. Anyway, if that's that, it should be easy to fix, by pre-expanding the dict. @Fusl let us know if you'll be able to test a patch to prove it.

Comment From: Fusl

I can test a patch if given in form of a Docker image.

Comment From: enjoy-binbin

I see we did call dictExpand first, this is an empty dict, so it will skip the rehash and then finish the expand. the count is 30, so this dictionary should also be small

https://github.com/redis/redis/blob/7.0.11/src/t_set.c#L774-L794

    /* CASE 4: We have a big set compared to the requested number of elements.
     * In this case we can simply get random elements from the set and add
     * to the temporary set, trying to eventually get enough unique elements
     * to reach the specified count. */
    else {
        unsigned long added = 0;
        sds sdsele;

        dictExpand(d, count);
        while (added < count) {
            encoding = setTypeRandomElement(set,&ele,&llele);
            if (encoding == OBJ_ENCODING_INTSET) {
                sdsele = sdsfromlonglong(llele);
            } else {
                sdsele = sdsdup(ele);
            }
            /* Try to add the object to the dictionary. If it already exists
             * free it, otherwise increment the number of objects we have
             * in the result dictionary. */
            if (dictAdd(d,sdsele,NULL) == DICT_OK)
                added++;
            else
                sdsfree(sdsele);
        }
    }

Comment From: Fusl

From looking through the logs.txt ltrace file attached in my initial bug report post it looks like it's getting stuck in this while loop because it keeps missing the if (dictAdd(d,sdsele,NULL) == DICT_OK) condition because setTypeRandomElement keeps returning the same entries repeatedly, thus ending up in an infinite loop condition:

~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | head
i:D8ZgAFd
i:kub6c1U
i:8FOvrOb
i:JOy6XF6
i:kub6c1U
i:3EoJJ9A
i:2j8F8yR
i:8FOvrOb
i:Ca0vivs
i:cjfhC8r
~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | wc -l
4021
~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | sort -u | wc -l
29
~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | sort | uniq -c | sort -nk1,1
     99 i:cjfhC8r
    114 i:0MbpsVS
    118 i:Y50L9kv
    122 i:JOy6XF6
    123 i:iLsWxYn
    123 i:vLH6ekx
    124 i:47TNxEw
    125 i:D8ZgAFd
    126 i:0KJoCZD
    126 i:2j8F8yR
    126 i:THCgaW5
    132 i:TujMofJ
    132 i:dUZXiJy
    133 i:bON8Faw
    136 i:Ca0vivs
    136 i:Spz2BLF
    141 i:7rwtfe7
    141 i:Dtb2VA7
    141 i:KzxS7Zx
    141 i:S99DlMj
    144 i:Bd0HtRi
    144 i:sloc6wM
    150 i:iOLgwqS
    151 i:3EoJJ9A
    151 i:kub6c1U
    152 i:YrguwbM
    154 i:g7rKNjj
    163 i:O1ezsVh
    253 i:8FOvrOb

Comment From: oranagra

The only reason i can think of that setTypeRandomElement will return the same elements many times if if the dict's hash table is seriously too small (maybe if there are constantly forks, and _dictExpandIfNeeded never happens).

it would be nice if we could print the size of the hash table, but sadly, it looks like the only way to do that is by DEBUG HTSTAT-KEY, but that one is also printing bucket statistics, which will take too long to gather on this key (could cause redis to hang).

maybe we can patch redis to have a light version of HTSTAT, one that just prints the sizes of the hash tables. but it would mean restarting the server and hoping to still be able to reproduce this.

Comment From: oranagra

i've run a test to see how long would DEBUG HTSTAT-KEY take. on 5m entries it took me roughly 100ms, and on 15m it took some 400ms. so maybe that's acceptable for you to try.

Comment From: sundb

@Fusl Did you do a lot of delete operations(srem or spop) on this set?

Comment From: Fusl

Yes, this set is in constant movement with millions of items getting added and removed every few minutes. At this very moment, the set is empty because of the project behind it having run out of data to process, so I can't easily reproduce the original issue.

Comment From: sundb

#define HASHTABLE_MIN_FILL        10      /* Minimal hash table fill 10% */
int dictNeedsResize(dict *dict) {
    long long size, used;

    size = dictSlots(dict);
    used = dictSize(dict);
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));
}

void test() {
    dict *dict_ret = dictCreate(&BenchmarkDictType);
    dict *dict = dictCreate(&BenchmarkDictType);
    long count = 5000000;
    char buf[4096];
    init_genrand64(1684911460);

    /* Create a dict with 5,000,000 elements */
    dictSetResizeEnabled(DICT_RESIZE_AVOID);
    for (int j = 0; j < count; j++) {
        int retval = dictAdd(dict,stringFromLongLong(j),(void*)j);
        assert(retval == DICT_OK);
    }
    assert((long)dictSize(dict) == count);

    /* Wait for rehashing. */
    while (dictIsRehashing(dict)) {
        dictRehashMilliseconds(dict,100);
    }

    /* Delete to 10,000 elements left. */
    int del = 4990000;
    while (del--) {
        dictEntry *de = dictGetRandomKey(dict);
        char *str = dictGetKey(de);
        assert((dictDelete(dict, str) == DICT_OK));
    }

    /* Restart rehash */
    dictSetResizeEnabled(DICT_RESIZE_ENABLE);
    assert(dictNeedsResize(dict));
    dictResize(dict);
    dictRehashNum(dict, 93900);

    /* Print ht stats */
    dictGetStats(buf,sizeof(buf),dict,1);
    printf("%s\n", buf);
    printf("dict->rehashidx: %d\n", dict->rehashidx);

    /* Disable rehash and get random keys from this dict */
    dictSetResizeEnabled(DICT_RESIZE_AVOID);
    for (int i = 0; i < 1000; i++) {
        dictEntry *de = dictGetFairRandomKey(dict);
        char *str = dictGetKey(de);
        dictAdd(dict_ret, zstrdup(str), NULL);
        // printf("random key: %s\n", str);
    }
    printf("unique dict size: %ld\n", dictSize(dict_ret));

    dictRelease(dict); 
    dictRelease(dict_ret);
}

Output:

Hash table 0 stats (main hash table):
 table size: 1048576
 number of elements: 1066
 different slots: 538
 max chain length: 7
 avg chain length (counted): 1.98
 avg chain length (computed): 1.98
 Chain length distribution:
   0: 1048038 (99.95%)
   1: 249 (0.02%)
   2: 151 (0.01%)
   3: 75 (0.01%)
   4: 37 (0.00%)
   5: 17 (0.00%)
   6: 6 (0.00%)
   7: 3 (0.00%)
Hash table 1 stats (rehashing target):
 table size: 16384
 number of elements: 8934
 different slots: 4172
 max chain length: 15
 avg chain length (counted): 2.14
 avg chain length (computed): 2.14
 Chain length distribution:
   0: 12212 (74.54%)
   1: 1865 (11.38%)
   2: 1108 (6.76%)
   3: 552 (3.37%)
   4: 335 (2.04%)
   5: 163 (0.99%)
   6: 78 (0.48%)
   7: 36 (0.22%)
   8: 14 (0.09%)
   9: 12 (0.07%)
   10: 5 (0.03%)
   11: 1 (0.01%)
   13: 2 (0.01%)
   15: 1 (0.01%)

dict->rehashidx: 943760
unique dict size: 71

Steps:

1) Disable rehash (set dict_can_resize to DICT_RESIZE_AVOID) 2) Create a dict with 5,000,000 elements and rehash it. 3) Delete the elements of this dictionary to 100,000. 4) Enable rehash (set dict_can_resize to DICT_RESIZE_ENABLE) 5) Rehash the dict so that rehashidx goes as far as possible to the end of dict[0]. In the above example, let the rehashidx reach 943760, the maximum is 1048576. 6) Get 1000 random keys from this dict.

From the above logs, we can find that we get 1000 random keys from the dict, but only 71 unique keys in the end.

https://github.com/redis/redis/blob/d71478a889935382f3bf8b9a39ab073eba0a856a/src/dict.c#L1079:L1080 The reason is that when the dict shrinks and the rehashidx is very close to the size of dict[0], the code will most likely enter the branch above.

https://github.com/redis/redis/blob/d71478a889935382f3bf8b9a39ab073eba0a856a/src/dict.c#L298:L302 This issue is probably introduced by #11692, when dict_can_resize is set to DICT_RESIZE_AVOID, the above code only considers the expansion case and still allows rehash, but ignores the shrinking case.

Try using the following patch to fix and the example will be fine.

-    if (dict_can_resize == DICT_RESIZE_FORBID || !dictIsRehashing(d)) return 0;
+    unsigned long s0 = DICTHT_SIZE(d->ht_size_exp[0]);
+    unsigned long s1 = DICTHT_SIZE(d->ht_size_exp[1]);
     if (dict_can_resize == DICT_RESIZE_AVOID && 
-        (DICTHT_SIZE(d->ht_size_exp[1]) / DICTHT_SIZE(d->ht_size_exp[0]) < dict_force
_resize_ratio))
+        ((s1 > s0 && s1 / s0 < dict_force_resize_ratio) ||
+         (s1 < s0 && s0 / s1 < dict_force_resize_ratio)))

BTW: I still can't simulate the above result using commands.

Comment From: Fusl

Some more background information: The configuration of our application has a high and low watermark where the application tries to keep a set filled with items to satisfy the boundaries of the high and low watermark. That high/low is currently configured as 30M/1M so if the amount of items in the set falls below 1M it will start filling that set with items until it reaches 30M items at which point it will stop filling it again. I wonder if that behavior of our application is something that's just accelerating some kind of race condition (maybe not race condition but I lack the in-depth understanding of what's going on to find a better word for it) to surface as the bug we're seeing.

Comment From: sundb

An accidental reappearance: I use srandmember command to let the rehashidx of the hash as close to 131072 as possible. It means that the speculation in the comment https://github.com/redis/redis/issues/12200#issuecomment-1560754386 is reasonable.

the status of this set key: current hash rehashidx: 130296.

127.0.0.1:6379> debug HTSTATS-KEY s full
Hash table 0 stats (main hash table).
 table size: 131072
 number of elements: 9
 different slots: 5
 max chain length: 3
 avg chain length (counted): 1.80
 avg chain length (computed): 1.80
 Chain length distribution.
   0: 131067 (100.00%)
   1: 2 (0.00%)
   2: 2 (0.00%)
   3: 1 (0.00%)
Hash table 1 stats (rehashing target).
 table size: 1024
 number of elements: 970
 different slots: 11
 max chain length: 117
 avg chain length (counted): 88.18
 avg chain length (computed): 88.18
 Chain length distribution.
   0: 1013 (98.93%)
   25: 1 (0.10%)

Command and output:

127.0.0.1:6379> srandmember s 110
  1) "m:81564"
  2) "m:55787"
  3) "m:63276"
  4) "m:86639"
  5) "m:74482"
  6) "m:41890"
  7) "m:54130"
  8) "m:90807"
  ...
  (2.98s)     <- too long

Comment From: sundb

@Fusl Can you use this patch to verify if it will be reproduced again?

diff --git a/src/dict.c b/src/dict.c
index ebd431d4..c2e471b7 100644
--- a/src/dict.c
+++ b/src/dict.c
@@ -294,9 +294,11 @@ int dictTryExpand(dict *d, unsigned long size) {
  * work it does would be unbound and the function may block for a long time. */
 int dictRehash(dict *d, int n) {
     int empty_visits = n*10; /* Max number of empty buckets to visit. */
-    if (dict_can_resize == DICT_RESIZE_FORBID || !dictIsRehashing(d)) return 0;
+    unsigned long s0 = DICTHT_SIZE(d->ht_size_exp[0]);
+    unsigned long s1 = DICTHT_SIZE(d->ht_size_exp[1]);
     if (dict_can_resize == DICT_RESIZE_AVOID && 
-        (DICTHT_SIZE(d->ht_size_exp[1]) / DICTHT_SIZE(d->ht_size_exp[0]) < dict_force_resize_ratio))
+        ((s1 > s0 && s1 / s0 < dict_force_resize_ratio) ||
+         (s1 < s0 && s0 / s1 < dict_force_resize_ratio)))
     {
         return 0;
     }

Comment From: Fusl

I can use this patch but I can't get a reproduction case together that will cause this behavior to happen again.

Comment From: oranagra

@sundb first, good job reproducing it. i suppose you did it while there is a fork child? secondly, how exactly did you produce that table? i see it's down sizing, and the original (bigger) one is nearly empty (was probably already nearly empty before rehashing), but what i don't understand is how come the new (smaller) one has such a long chain. did you use SSCAN to delete members?

regarding the fix, maybe it would be possible to improve dictGetSomeKeys, but i suppose long chains would still mean we never sample from the end of the chain.

regarding the fix you're proposing, 1. we should still respect FORBID, we don't want the fork child to do any re-hashing. 2. downsizing happens at x/10, so checking dict_force_resize_ratio (5) seems pointless.

Comment From: sundb

@sundb first, good job reproducing it. i suppose you did it while there is a fork child? secondly, how exactly did you produce that table? i see it's down sizing, and the original (bigger) one is nearly empty (was probably already nearly empty before rehashing), but what i don't understand is how come the new (smaller) one has such a long chain. did you use SSCAN to delete members?

Yes, using SSCAN to delete members. I did this because I wanted the members of dict[0] to be concentrated at the end as much as possible.

Config:

enable-debug-command yes
save ""
set-max-listpack-entries 0
rdb-key-save-delay 2147483647

Steps: 1) Use python to create a dict just in time to start shrinking and start rehashing.

import redis
import os

def is_rehashing(r):
    resp = r.execute_command("debug", "HTSTATS-KEY", "s", "full")
    return resp.find("rehashing target") != -1

r = redis.Redis(host='localhost', port=6379, decode_responses=True)
r.flushdb()

pipeline = r.pipeline()
for num in range(0,100000):
    pipeline.sadd("s", "m:" + str(num))
pipeline.execute()

while is_rehashing(r):
    r.srandmember("s", 100)

r.bgsave()

print(is_rehashing(r))
pipeline = r.pipeline()
count = 0
cursor = 0
while count < 99000:
    resp = r.sscan("s", cursor)
    cursor = resp[0]
    for m in resp[1]:
        count += 1
        pipeline.srem("s", m)
pipeline.execute()

os.system("pgrep -f redis-rdb-bgsave | xargs kill -9")

while is_rehashing(r) == False:
    r.spop("s")

2) Manually execute srandmember s <count> to rehash this dict to let its rehashidx close to dict[0]'s maxsizemask(131071). 3) BGSAVE to stop rehash.

If at this time, the elements of dict[0] are concentrated after rehashidx, srandmember will have a probability of 0.0078(1024/131071) to randomly find duplicate members all the time. Theoretically, if there are GETFAIR_NUM_ENTRIES(15) members exactly after the rehashidx, srandmember will probably die loop.

Comment From: sundb

  1. downsizing happens at x/10, so checking dict_force_resize_ratio (5) seems pointless.

The resize occurs when (used/size) /10 of dict[0], but dict_force_resize_ratio(5) is between dict_size_0 and dict_size_1, they don't conflict with each other. In the above example: before rehash

table used: 979
table size: 131072

(131072 / 979) /10 = 13.38, start rehash.

start rehash

table 1 size: 131072
table 2 size: 1024

(131072 / 1024) = 138, I think this dict still can rehash.

Comment From: oranagra

Ok. SCAN+DEL is a recipe for trouble (not just for rehashing, but also SCAN, RAND, eviction, etc). But is that really what @Fusl did?

So before #11692 both the fork child and the parent could keep incremental rehashing during fork, and besides blocking the fork child from doing it, I also prevented the parent from doing it unless we need an extreme upscale, but we should have also allowed down scale. Right? The FORBID part should be kept, and this fix should be taken to several versions.

Comment From: sundb

Ok. SCAN+DEL is a recipe for trouble (not just for rehashing, but also SCAN, RAND, eviction, etc). But is that really what @Fusl did?

Because i saw a lot of SSCAN commands in INFO ALL posted in https://github.com/redis/redis/issues/12200#issuecomment-1554476660

cmdstat_srem:calls=1914081,usec=56042921,usec_per_call=29.28,rejected_calls=0,failed_calls=0
cmdstat_sscan:calls=804510,usec=4211523144,usec_per_call=5234.89,rejected_calls=0,failed_calls=0
cmdstat_spop:calls=76303,usec=93211447,usec_per_call=1221.60,rejected_calls=0,failed_calls=0

So before #11692 both the fork child and the parent could keep incremental rehashing during fork, and besides blocking the fork child from doing it, I also prevented the parent from doing it unless we need an extreme upscale, but we should have also allowed down scale. Right?

Yes, This is the only possibility I can think of for dictGetSomeKeys() to return duplicate data. Another way to fix it is to optimize dictGetSomeKeys(), so that it randomly the slot from [rehashidx, dict_size_2 + dict_size_1 - rehashidx] instead of [0, maxsizemask] (unsigned long i = randomULong() & maxsizemask;), but I'm not sure why antirez didn't do this.

Comment From: Fusl

SSCAN is part of how we maintain items in a set, yes. We do an SSCAN, move that data elsewhere (to offload huge amounts of data to disk) and then SREM those entries. Now that I think about it, we only started seeing the behavior of Redis hanging when we enabled said feature and this one is the only one doing a combination of SSCAN+SREM while the main system is the one running SRANDMEMBER as the primary way to get items from it.

Comment From: oranagra

Another way to fix it is to optimize dictGetSomeKeys(), so that it randomly the slot from [rehashidx, dict_size_2 + dict_size_1 - rehashidx] instead of [0, maxsizemask] (unsigned long i = randomULong() & maxsizemask;), but I'm not sure why antirez didn't do this.

i think that when we have such a long chain in a bucket, we have no chance to ever pick the items at the end, so this pathological issue isn't gonna be solved by any change to the bucket selection.

SSCAN is part of how we maintain items in a set, yes. We do an SSCAN, move that data elsewhere (to offload huge amounts of data to disk) and then SREM those entries. Now that I think about it, we only started seeing the behavior of Redis hanging when we enabled said feature and this one is the only one doing a combination of SSCAN+SREM while the main system is the one running SRANDMEMBER as the primary way to get items from it.

@Fusl so did you use this pattern on older versions (before 7.0.8)? or just on the on later ones? in any case, i advise to avoid this pattern, redis really doesn't like it.

Comment From: Fusl

i advise to avoid this pattern, redis really doesn't like it.

I'm aware of that now but we have to do it this way due to how we store data in sets. Even if it is advised to avoid it, it shouldn't cause Redis to hang forever. A possible workaround for us would be to use negative numbers in srandmember to make it not try to find unique entries and allow duplicates to be returned (we'd be handing duplicates within our application then) but I don't know what the downsides are here internally inside Redis.

did you use this pattern on older versions

Yes, we used the same pattern before on older versions as well without Redis hanging, although never at the scale that we've done so recently.