Our application uses Lua EVAL/EVALSHA to do most heavy lifting of logic that requires atomic database changes.
During normal operation, Redis has a seemingly random chance to hang during the (only) SRANDMEMBER call of our Lua code when performing a background save (using either RDB or AOF) where it doesn't recover from even after the background save finishes.
We're unable to determine a solid reproduction case at this point since we can't pinpoint exactly what's causing it yet but across multiple GDB backtraces (below) we believe that we were able to figure out that when Redis is hanging during the command call, it's always stuck in this while loop, although we're not 100% sure about that.
The version of Redis we're using is 7.0.11 running inside Docker with this configuration:
activedefrag yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 50
active-defrag-cycle-min 1
active-defrag-cycle-max 25
active-defrag-max-scan-fields 10000
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush yes
lazyfree-lazy-user-del yes
appendonly yes
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 1gb
client-output-buffer-limit slave 10737418240 1073741824 300
bind 0.0.0.0
port ${PORT}
requirepass ${PASS}
save ''
lua-time-limit 5000
busy-reply-threshold 5000
Note that while we have AOF configured here with RDB being disabled, we also tried running this Redis server with AOF disabled and RDB set to snapshot every 5 minutes after 1 change. We also tried explicitly setting lua-time-limit and busy-reply-threshold in the startup configuration as well as setting CONFIG SET watchdog-period 1000 during runtime but none of those has produced any usable logs at the time Redis became unresponsive.
Logs indicate that the hangs always happen during a background save:
AOF rewrite:
1:M 18 May 2023 12:46:34.929 * Starting automatic rewriting of AOF on 100% growth
1:M 18 May 2023 12:46:34.930 * Creating AOF incr file appendonly.aof.4377.incr.aof on background rewrite
1:M 18 May 2023 12:46:35.164 * Background append only file rewriting started by pid 59
<Redis became unresponsive at 12:47:23>
59:C 18 May 2023 12:47:37.119 * Successfully created the temporary AOF base file temp-rewriteaof-bg-59.aof
59:C 18 May 2023 12:47:37.260 * Fork CoW for AOF rewrite: current 1572 MB, peak 1572 MB, average 818 MB
AOF rewrite:
1:M 19 May 2023 02:08:38.075 * Starting automatic rewriting of AOF on 100% growth
1:M 19 May 2023 02:08:38.076 * Creating AOF incr file appendonly.aof.4385.incr.aof on background rewrite
1:M 19 May 2023 02:08:38.345 * Background append only file rewriting started by pid 25
<Redis became unresponsive at 02:08:50>
25:C 19 May 2023 02:10:04.373 * Successfully created the temporary AOF base file temp-rewriteaof-bg-25.aof
25:C 19 May 2023 02:10:04.591 * Fork CoW for AOF rewrite: current 605 MB, peak 605 MB, average 329 MB
RDB save:
1:M 19 May 2023 04:25:55.055 * 1 changes in 300 seconds. Saving...
1:M 19 May 2023 04:25:55.374 * Background saving started by pid 28
<Redis became unresponsive at 04:25:58>
28:C 19 May 2023 04:27:20.861 * DB saved on disk
28:C 19 May 2023 04:27:21.083 * Fork CoW for RDB: current 142 MB, peak 142 MB, average 96 MB
2023-05-18T13:02:50+00:00
#0 _dictKeyIndex (existing=0x0, hash=4486456289374595091, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1054
#1 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#2 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#3 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#4 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7 0x0000558d683b9e88 in luaD_precall ()
#8 0x0000558d683c4110 in luaV_execute ()
#9 0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:02:56+00:00
#0 0x00007f56c792531f in random () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f56c7925799 in rand () from target:/lib/x86_64-linux-gnu/libc.so.6
#2 0x0000558d682d3764 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:797
#3 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#4 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#5 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#6 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#7 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#8 0x0000558d683b9e88 in luaD_precall ()
#9 0x0000558d683c4110 in luaV_execute ()
#10 0x0000558d683ba525 in luaD_call ()
#11 0x0000558d683b9858 in luaD_rawrunprotected ()
#12 0x0000558d683ba6d0 in luaD_pcall ()
#13 0x0000558d683b79f8 in lua_pcall ()
#14 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#15 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#16 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#17 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#18 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#19 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#20 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#21 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#22 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#23 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#24 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#25 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:00+00:00
#0 0x0000558d682d4abe in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:244
#1 0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=15826992378078849043, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#2 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#3 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#4 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#5 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#6 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#7 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#8 0x0000558d683b9e88 in luaD_precall ()
#9 0x0000558d683c4110 in luaV_execute ()
#10 0x0000558d683ba525 in luaD_call ()
#11 0x0000558d683b9858 in luaD_rawrunprotected ()
#12 0x0000558d683ba6d0 in luaD_pcall ()
#13 0x0000558d683b79f8 in lua_pcall ()
#14 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#15 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#16 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#17 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#18 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#19 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#20 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#21 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#22 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#23 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#24 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#25 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:05+00:00
#0 0x0000558d682e30b9 in sdsfree (s=<optimized out>) at sds.c:196
#1 0x0000558d68318938 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:792
#2 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#3 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#4 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#5 0x0000558d683b9e88 in luaD_precall ()
#6 0x0000558d683c4110 in luaV_execute ()
#7 0x0000558d683ba525 in luaD_call ()
#8 0x0000558d683b9858 in luaD_rawrunprotected ()
#9 0x0000558d683ba6d0 in luaD_pcall ()
#10 0x0000558d683b79f8 in lua_pcall ()
#11 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#12 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#13 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#14 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#15 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#16 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#17 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#18 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#19 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#20 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#21 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#22 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:10+00:00
#0 dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=0x7ffc86daaac0, des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:764
#1 0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7 0x0000558d683b9e88 in luaD_precall ()
#8 0x0000558d683c4110 in luaV_execute ()
#9 0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:16+00:00
#0 sz_index2size_lookup (index=1) at include/jemalloc/internal/sz.h:199
#1 sz_index2size (index=1) at include/jemalloc/internal/sz.h:207
#2 arena_salloc (ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:195
#3 isalloc (ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:38
#4 je_malloc_usable_size (ptr=ptr@entry=0x7f56906063c0) at src/jemalloc.c:3740
#5 0x0000558d682e69fe in zfree (ptr=0x7f56906063c0) at zmalloc.c:372
#6 zfree (ptr=0x7f56906063c0) at zmalloc.c:364
#7 0x0000558d68318938 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:792
#8 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#9 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#10 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#11 0x0000558d683b9e88 in luaD_precall ()
#12 0x0000558d683c4110 in luaV_execute ()
#13 0x0000558d683ba525 in luaD_call ()
#14 0x0000558d683b9858 in luaD_rawrunprotected ()
#15 0x0000558d683ba6d0 in luaD_pcall ()
#16 0x0000558d683b79f8 in lua_pcall ()
#17 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#18 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#19 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#20 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#21 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#22 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#23 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#24 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#25 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#26 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#27 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#28 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:20+00:00
#0 0x00007f56c7a42a31 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2 0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=16039903783278520339, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9 0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:24+00:00
#0 dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:738
#1 0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7 0x0000558d683b9e88 in luaD_precall ()
#8 0x0000558d683c4110 in luaV_execute ()
#9 0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:29+00:00
#0 dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=0x7ffc86daab00, des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:733
#1 0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7 0x0000558d683b9e88 in luaD_precall ()
#8 0x0000558d683c4110 in luaV_execute ()
#9 0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:34+00:00
#0 0x0000558d682d27be in dictTypeExpandAllowed (d=0x7f549517e318) at dict.c:989
#1 _dictExpandIfNeeded (d=0x7f549517e318) at dict.c:1008
#2 _dictKeyIndex (existing=0x0, hash=4334487920869140499, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1048
#3 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9 0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:38+00:00
#0 0x0000558d682d4a7d in sdslen (s=0x7f56909d2311 "i:blZ7htj") at sds.h:89
#1 dictSdsKeyCompare (d=<optimized out>, key1=0x7f56906063c1, key2=0x7f56909d2311) at server.c:241
#2 0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=2834940632465860627, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9 0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:44+00:00
#0 dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=0x7ffc86daaac0, des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:768
#1 0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7 0x0000558d683b9e88 in luaD_precall ()
#8 0x0000558d683c4110 in luaV_execute ()
#9 0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:03:51+00:00
#0 0x00007f56c7a42a28 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2 0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=1090945667569254371, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9 0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:04:13+00:00
#0 zfree (ptr=0x7f56906063c0) at zmalloc.c:373
#1 zfree (ptr=0x7f56906063c0) at zmalloc.c:364
#2 0x0000558d68318938 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:792
#3 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#4 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#5 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#6 0x0000558d683b9e88 in luaD_precall ()
#7 0x0000558d683c4110 in luaV_execute ()
#8 0x0000558d683ba525 in luaD_call ()
#9 0x0000558d683b9858 in luaD_rawrunprotected ()
#10 0x0000558d683ba6d0 in luaD_pcall ()
#11 0x0000558d683b79f8 in lua_pcall ()
#12 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#13 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#14 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#15 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#16 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#17 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#18 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#19 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#20 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#21 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#22 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#23 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:04:17+00:00
#0 0x0000558d682d35d9 in dictGetSomeKeys (d=d@entry=0x7f552a142cf8, des=des@entry=0x7ffc86daaaa0, count=<optimized out>, count@entry=15) at dict.c:755
#1 0x0000558d682d3759 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:791
#2 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#3 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#4 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7 0x0000558d683b9e88 in luaD_precall ()
#8 0x0000558d683c4110 in luaV_execute ()
#9 0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:04:23+00:00
#0 0x00007f56c7a42a24 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2 0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=17057246290930548707, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9 0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:04:27+00:00
#0 0x00007f56c7925330 in random () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f56c7925799 in rand () from target:/lib/x86_64-linux-gnu/libc.so.6
#2 0x0000558d682d3764 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:797
#3 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#4 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#5 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#6 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#7 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#8 0x0000558d683b9e88 in luaD_precall ()
#9 0x0000558d683c4110 in luaV_execute ()
#10 0x0000558d683ba525 in luaD_call ()
#11 0x0000558d683b9858 in luaD_rawrunprotected ()
#12 0x0000558d683ba6d0 in luaD_pcall ()
#13 0x0000558d683b79f8 in lua_pcall ()
#14 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#15 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#16 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#17 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#18 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#19 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#20 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#21 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#22 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#23 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#24 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#25 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:04:32+00:00
#0 0x00007f56c7925712 in random_r () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f56c79252dd in random () from target:/lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f56c7925799 in rand () from target:/lib/x86_64-linux-gnu/libc.so.6
#3 0x0000558d682d3764 in dictGetFairRandomKey (d=0x7f552a142cf8) at dict.c:797
#4 0x0000558d683179b9 in setTypeRandomElement (setobj=0x7f563c36d7c0, sdsele=0x7ffc86daab80, llele=0x7ffc86daab88) at t_set.c:213
#5 0x0000558d68318889 in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:780
#6 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9 0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:21:03+00:00
#0 0x00007f56c7a42a28 in ?? () from target:/lib/x86_64-linux-gnu/libc.so.6
#1 0x0000558d682d4ab2 in dictSdsKeyCompare (d=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:243
#2 0x0000558d682d2904 in _dictKeyIndex (existing=0x0, hash=18240379238795239395, key=0x7f56906063c1, d=0x7f549517e318) at dict.c:1055
#3 dictAddRaw (d=0x7f549517e318, key=0x7f56906063c1, existing=0x0) at dict.c:337
#4 0x0000558d682d2b01 in dictAdd (d=d@entry=0x7f549517e318, key=key@entry=0x7f56906063c1, val=val@entry=0x0) at dict.c:302
#5 0x0000558d6831885e in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:789
#6 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#7 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#8 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#9 0x0000558d683b9e88 in luaD_precall ()
#10 0x0000558d683c4110 in luaV_execute ()
#11 0x0000558d683ba525 in luaD_call ()
#12 0x0000558d683b9858 in luaD_rawrunprotected ()
#13 0x0000558d683ba6d0 in luaD_pcall ()
#14 0x0000558d683b79f8 in lua_pcall ()
#15 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#16 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#17 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#18 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#19 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#20 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#21 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#22 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#23 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#24 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#25 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#26 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
2023-05-18T13:32:52+00:00
#0 extend_to_usable (ptr=ptr@entry=0x7f56906063c0, size=size@entry=16) at zmalloc.c:107
#1 0x0000558d682e6320 in zmalloc_usable (size=size@entry=11, usable=usable@entry=0x7ffc86daab18) at zmalloc.c:162
#2 0x0000558d682e2dc3 in _sdsnewlen (init=0x7f56004947b1, initlen=9, trymalloc=<optimized out>) at sds.c:117
#3 0x0000558d6831884a in srandmemberWithCountCommand (c=0x7f56c7525000) at t_set.c:784
#4 0x0000558d682d94db in call (c=c@entry=0x7f56c7525000, flags=15) at server.c:3385
#5 0x0000558d683a68fd in scriptCall (run_ctx=run_ctx@entry=0x7ffc86dab0f0, err=err@entry=0x7ffc86daaca8) at script.c:570
#6 0x0000558d683a4756 in luaRedisGenericCommand (lua=0x558d69f537f0, raise_error=1) at script_lua.c:936
#7 0x0000558d683b9e88 in luaD_precall ()
#8 0x0000558d683c4110 in luaV_execute ()
#9 0x0000558d683ba525 in luaD_call ()
#10 0x0000558d683b9858 in luaD_rawrunprotected ()
#11 0x0000558d683ba6d0 in luaD_pcall ()
#12 0x0000558d683b79f8 in lua_pcall ()
#13 0x0000558d683a5ddb in luaCallFunction (run_ctx=0x7ffc86dab0f0, lua=0x558d69f537f0, keys=0x7f56c7434668, nkeys=0, args=0x7f56c7434668, nargs=10, debug_enabled=0) at script_lua.c:1678
#14 0x0000558d6834c503 in evalGenericCommand (c=0x7f56aff39a00, evalsha=1) at eval.c:553
#15 0x0000558d682d94db in call (c=c@entry=0x7f56aff39a00, flags=flags@entry=15) at server.c:3385
#16 0x0000558d682dbca5 in processCommand (c=c@entry=0x7f56aff39a00) at server.c:4019
#17 0x0000558d682f36ee in processCommandAndResetClient (c=0x7f56aff39a00) at networking.c:2473
#18 processInputBuffer (c=0x7f56aff39a00) at networking.c:2577
#19 0x0000558d682f65e0 in readQueryFromClient (conn=<optimized out>) at networking.c:2713
#20 0x0000558d6839ebc8 in callHandler (handler=<optimized out>, conn=0x7f552bb06380) at connhelpers.h:79
#21 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7f552bb06380, mask=<optimized out>) at connection.c:310
#22 0x0000558d682d099a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f56c742b1e0, flags=flags@entry=27) at ae.c:436
#23 0x0000558d682d0d3d in aeMain (eventLoop=0x7f56c742b1e0) at ae.c:496
#24 0x0000558d682cc5f1 in main (argc=<optimized out>, argv=0x7ffc86dab498) at server.c:7168
Here is also the output of ltrace -ffs256 -p<pid> of the hanging Redis process.
Comment From: sundb
@Fusl Can you provide the parameters of the srandmember command and the length of this key?
My guess is that the dictAdd() is abnormally slow because the rehash of the dictionary is suspended during the rdb saving or aof rewrite.
Comment From: Fusl
I can't exactly tell the parameters of the srandmember call since most of the decision logic happens during runtime in the Lua script but from a quick monitor I can see that we're limiting the call to a max count of 30 at this time, however, the count can be lower than that as well:
1684496276.975048 [0 lua] "srandmember" "imgur:todo" "30"
1684496276.980532 [0 lua] "srandmember" "imgur:todo" "10"
1684496276.981337 [0 lua] "srandmember" "imgur:todo" "27"
1684496276.982097 [0 lua] "srandmember" "imgur:todo" "3"
1684496276.995408 [0 lua] "srandmember" "imgur:todo" "15"
1684496276.996122 [0 lua] "srandmember" "imgur:todo" "7"
1684496276.996658 [0 lua] "srandmember" "imgur:todo" "11"
1684496276.997268 [0 lua] "srandmember" "imgur:todo" "2"
1684496277.008314 [0 lua] "srandmember" "imgur:todo" "27"
1684496277.012712 [0 lua] "srandmember" "imgur:todo" "13"
1684496277.036401 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.065458 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.070976 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.071830 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.073522 [0 lua] "srandmember" "imgur:todo" "22"
1684496277.081004 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.082138 [0 lua] "srandmember" "imgur:todo" "1"
1684496277.093768 [0 lua] "srandmember" "imgur:todo" "29"
1684496277.103692 [0 lua] "srandmember" "imgur:todo" "25"
1684496277.118868 [0 lua] "srandmember" "imgur:todo" "30"
1684496277.125603 [0 lua] "srandmember" "imgur:todo" "18"
1684496277.134662 [0 lua] "srandmember" "imgur:todo" "28"
1684496277.145458 [0 lua] "srandmember" "imgur:todo" "27"
1684496277.149010 [0 lua] "srandmember" "imgur:todo" "12"
1684496277.156520 [0 lua] "srandmember" "imgur:todo" "13"
1684496277.177948 [0 lua] "srandmember" "imgur:todo" "30"
The current scard of this key is 5747349. Do note though that the Lua script that executes the srandmember also immediately afterwards does a srem after making some further decisions on the set member string, so it's quite possible that this set was a little larger than reporting this bug initially, but still well below 10M entries.
Comment From: sundb
@Fusl Thx, Can you also provide the output of INFO ALL and SLOWLOG GET?
Maybe also include lua scripts if possible.
Comment From: Fusl
info all
slowlog get
script.lua
Comment From: Fusl
Note that we're currently running Redis without activedefrag yes to see if that's maybe causing issues for us and it's been chugging along since 6 hours again. However, that also doesn't mean a lot since Redis has been happy for a few days before suddenly not being happy anymore and hanging multiple times within an hour after restarting the container.
Comment From: oranagra
@sundb your theory is interesting. It means that srandmember isn't hand, it just becomes slower during fork, and then when run repeatedly it looks like it's hand. This rehashing prevention is something that was changed in one of the recent versions. Anyway, if that's that, it should be easy to fix, by pre-expanding the dict. @Fusl let us know if you'll be able to test a patch to prove it.
Comment From: Fusl
I can test a patch if given in form of a Docker image.
Comment From: enjoy-binbin
I see we did call dictExpand first, this is an empty dict, so it will skip the rehash and then finish the expand. the count is 30, so this dictionary should also be small
https://github.com/redis/redis/blob/7.0.11/src/t_set.c#L774-L794
/* CASE 4: We have a big set compared to the requested number of elements.
* In this case we can simply get random elements from the set and add
* to the temporary set, trying to eventually get enough unique elements
* to reach the specified count. */
else {
unsigned long added = 0;
sds sdsele;
dictExpand(d, count);
while (added < count) {
encoding = setTypeRandomElement(set,&ele,&llele);
if (encoding == OBJ_ENCODING_INTSET) {
sdsele = sdsfromlonglong(llele);
} else {
sdsele = sdsdup(ele);
}
/* Try to add the object to the dictionary. If it already exists
* free it, otherwise increment the number of objects we have
* in the result dictionary. */
if (dictAdd(d,sdsele,NULL) == DICT_OK)
added++;
else
sdsfree(sdsele);
}
}
Comment From: Fusl
From looking through the logs.txt ltrace file attached in my initial bug report post it looks like it's getting stuck in this while loop because it keeps missing the if (dictAdd(d,sdsele,NULL) == DICT_OK) condition because setTypeRandomElement keeps returning the same entries repeatedly, thus ending up in an infinite loop condition:
~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | head
i:D8ZgAFd
i:kub6c1U
i:8FOvrOb
i:JOy6XF6
i:kub6c1U
i:3EoJJ9A
i:2j8F8yR
i:8FOvrOb
i:Ca0vivs
i:cjfhC8r
~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | wc -l
4021
~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | sort -u | wc -l
29
~/Downloads > cat logs.txt | fgrep memcpy | cut -d'"' -f2 | sort | uniq -c | sort -nk1,1
99 i:cjfhC8r
114 i:0MbpsVS
118 i:Y50L9kv
122 i:JOy6XF6
123 i:iLsWxYn
123 i:vLH6ekx
124 i:47TNxEw
125 i:D8ZgAFd
126 i:0KJoCZD
126 i:2j8F8yR
126 i:THCgaW5
132 i:TujMofJ
132 i:dUZXiJy
133 i:bON8Faw
136 i:Ca0vivs
136 i:Spz2BLF
141 i:7rwtfe7
141 i:Dtb2VA7
141 i:KzxS7Zx
141 i:S99DlMj
144 i:Bd0HtRi
144 i:sloc6wM
150 i:iOLgwqS
151 i:3EoJJ9A
151 i:kub6c1U
152 i:YrguwbM
154 i:g7rKNjj
163 i:O1ezsVh
253 i:8FOvrOb
Comment From: oranagra
The only reason i can think of that setTypeRandomElement will return the same elements many times if if the dict's hash table is seriously too small (maybe if there are constantly forks, and _dictExpandIfNeeded never happens).
it would be nice if we could print the size of the hash table, but sadly, it looks like the only way to do that is by DEBUG HTSTAT-KEY, but that one is also printing bucket statistics, which will take too long to gather on this key (could cause redis to hang).
maybe we can patch redis to have a light version of HTSTAT, one that just prints the sizes of the hash tables. but it would mean restarting the server and hoping to still be able to reproduce this.
Comment From: oranagra
i've run a test to see how long would DEBUG HTSTAT-KEY take.
on 5m entries it took me roughly 100ms, and on 15m it took some 400ms.
so maybe that's acceptable for you to try.
Comment From: sundb
@Fusl Did you do a lot of delete operations(srem or spop) on this set?
Comment From: Fusl
Yes, this set is in constant movement with millions of items getting added and removed every few minutes. At this very moment, the set is empty because of the project behind it having run out of data to process, so I can't easily reproduce the original issue.
Comment From: sundb
#define HASHTABLE_MIN_FILL 10 /* Minimal hash table fill 10% */
int dictNeedsResize(dict *dict) {
long long size, used;
size = dictSlots(dict);
used = dictSize(dict);
return (size > DICT_HT_INITIAL_SIZE &&
(used*100/size < HASHTABLE_MIN_FILL));
}
void test() {
dict *dict_ret = dictCreate(&BenchmarkDictType);
dict *dict = dictCreate(&BenchmarkDictType);
long count = 5000000;
char buf[4096];
init_genrand64(1684911460);
/* Create a dict with 5,000,000 elements */
dictSetResizeEnabled(DICT_RESIZE_AVOID);
for (int j = 0; j < count; j++) {
int retval = dictAdd(dict,stringFromLongLong(j),(void*)j);
assert(retval == DICT_OK);
}
assert((long)dictSize(dict) == count);
/* Wait for rehashing. */
while (dictIsRehashing(dict)) {
dictRehashMilliseconds(dict,100);
}
/* Delete to 10,000 elements left. */
int del = 4990000;
while (del--) {
dictEntry *de = dictGetRandomKey(dict);
char *str = dictGetKey(de);
assert((dictDelete(dict, str) == DICT_OK));
}
/* Restart rehash */
dictSetResizeEnabled(DICT_RESIZE_ENABLE);
assert(dictNeedsResize(dict));
dictResize(dict);
dictRehashNum(dict, 93900);
/* Print ht stats */
dictGetStats(buf,sizeof(buf),dict,1);
printf("%s\n", buf);
printf("dict->rehashidx: %d\n", dict->rehashidx);
/* Disable rehash and get random keys from this dict */
dictSetResizeEnabled(DICT_RESIZE_AVOID);
for (int i = 0; i < 1000; i++) {
dictEntry *de = dictGetFairRandomKey(dict);
char *str = dictGetKey(de);
dictAdd(dict_ret, zstrdup(str), NULL);
// printf("random key: %s\n", str);
}
printf("unique dict size: %ld\n", dictSize(dict_ret));
dictRelease(dict);
dictRelease(dict_ret);
}
Output:
Hash table 0 stats (main hash table):
table size: 1048576
number of elements: 1066
different slots: 538
max chain length: 7
avg chain length (counted): 1.98
avg chain length (computed): 1.98
Chain length distribution:
0: 1048038 (99.95%)
1: 249 (0.02%)
2: 151 (0.01%)
3: 75 (0.01%)
4: 37 (0.00%)
5: 17 (0.00%)
6: 6 (0.00%)
7: 3 (0.00%)
Hash table 1 stats (rehashing target):
table size: 16384
number of elements: 8934
different slots: 4172
max chain length: 15
avg chain length (counted): 2.14
avg chain length (computed): 2.14
Chain length distribution:
0: 12212 (74.54%)
1: 1865 (11.38%)
2: 1108 (6.76%)
3: 552 (3.37%)
4: 335 (2.04%)
5: 163 (0.99%)
6: 78 (0.48%)
7: 36 (0.22%)
8: 14 (0.09%)
9: 12 (0.07%)
10: 5 (0.03%)
11: 1 (0.01%)
13: 2 (0.01%)
15: 1 (0.01%)
dict->rehashidx: 943760
unique dict size: 71
Steps:
1) Disable rehash (set dict_can_resize to DICT_RESIZE_AVOID)
2) Create a dict with 5,000,000 elements and rehash it.
3) Delete the elements of this dictionary to 100,000.
4) Enable rehash (set dict_can_resize to DICT_RESIZE_ENABLE)
5) Rehash the dict so that rehashidx goes as far as possible to the end of dict[0].
In the above example, let the rehashidx reach 943760, the maximum is 1048576.
6) Get 1000 random keys from this dict.
From the above logs, we can find that we get 1000 random keys from the dict, but only 71 unique keys in the end.
https://github.com/redis/redis/blob/d71478a889935382f3bf8b9a39ab073eba0a856a/src/dict.c#L1079:L1080
The reason is that when the dict shrinks and the rehashidx is very close to the size of dict[0], the code will most likely enter the branch above.
https://github.com/redis/redis/blob/d71478a889935382f3bf8b9a39ab073eba0a856a/src/dict.c#L298:L302
This issue is probably introduced by #11692, when dict_can_resize is set to DICT_RESIZE_AVOID, the above code only considers the expansion case and still allows rehash, but ignores the shrinking case.
Try using the following patch to fix and the example will be fine.
- if (dict_can_resize == DICT_RESIZE_FORBID || !dictIsRehashing(d)) return 0;
+ unsigned long s0 = DICTHT_SIZE(d->ht_size_exp[0]);
+ unsigned long s1 = DICTHT_SIZE(d->ht_size_exp[1]);
if (dict_can_resize == DICT_RESIZE_AVOID &&
- (DICTHT_SIZE(d->ht_size_exp[1]) / DICTHT_SIZE(d->ht_size_exp[0]) < dict_force
_resize_ratio))
+ ((s1 > s0 && s1 / s0 < dict_force_resize_ratio) ||
+ (s1 < s0 && s0 / s1 < dict_force_resize_ratio)))
BTW: I still can't simulate the above result using commands.
Comment From: Fusl
Some more background information: The configuration of our application has a high and low watermark where the application tries to keep a set filled with items to satisfy the boundaries of the high and low watermark. That high/low is currently configured as 30M/1M so if the amount of items in the set falls below 1M it will start filling that set with items until it reaches 30M items at which point it will stop filling it again. I wonder if that behavior of our application is something that's just accelerating some kind of race condition (maybe not race condition but I lack the in-depth understanding of what's going on to find a better word for it) to surface as the bug we're seeing.
Comment From: sundb
An accidental reappearance:
I use srandmember command to let the rehashidx of the hash as close to 131072 as possible.
It means that the speculation in the comment https://github.com/redis/redis/issues/12200#issuecomment-1560754386 is reasonable.
the status of this set key:
current hash rehashidx: 130296.
127.0.0.1:6379> debug HTSTATS-KEY s full
Hash table 0 stats (main hash table).
table size: 131072
number of elements: 9
different slots: 5
max chain length: 3
avg chain length (counted): 1.80
avg chain length (computed): 1.80
Chain length distribution.
0: 131067 (100.00%)
1: 2 (0.00%)
2: 2 (0.00%)
3: 1 (0.00%)
Hash table 1 stats (rehashing target).
table size: 1024
number of elements: 970
different slots: 11
max chain length: 117
avg chain length (counted): 88.18
avg chain length (computed): 88.18
Chain length distribution.
0: 1013 (98.93%)
25: 1 (0.10%)
Command and output:
127.0.0.1:6379> srandmember s 110
1) "m:81564"
2) "m:55787"
3) "m:63276"
4) "m:86639"
5) "m:74482"
6) "m:41890"
7) "m:54130"
8) "m:90807"
...
(2.98s) <- too long
Comment From: sundb
@Fusl Can you use this patch to verify if it will be reproduced again?
diff --git a/src/dict.c b/src/dict.c
index ebd431d4..c2e471b7 100644
--- a/src/dict.c
+++ b/src/dict.c
@@ -294,9 +294,11 @@ int dictTryExpand(dict *d, unsigned long size) {
* work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
int empty_visits = n*10; /* Max number of empty buckets to visit. */
- if (dict_can_resize == DICT_RESIZE_FORBID || !dictIsRehashing(d)) return 0;
+ unsigned long s0 = DICTHT_SIZE(d->ht_size_exp[0]);
+ unsigned long s1 = DICTHT_SIZE(d->ht_size_exp[1]);
if (dict_can_resize == DICT_RESIZE_AVOID &&
- (DICTHT_SIZE(d->ht_size_exp[1]) / DICTHT_SIZE(d->ht_size_exp[0]) < dict_force_resize_ratio))
+ ((s1 > s0 && s1 / s0 < dict_force_resize_ratio) ||
+ (s1 < s0 && s0 / s1 < dict_force_resize_ratio)))
{
return 0;
}
Comment From: Fusl
I can use this patch but I can't get a reproduction case together that will cause this behavior to happen again.
Comment From: oranagra
@sundb first, good job reproducing it. i suppose you did it while there is a fork child? secondly, how exactly did you produce that table? i see it's down sizing, and the original (bigger) one is nearly empty (was probably already nearly empty before rehashing), but what i don't understand is how come the new (smaller) one has such a long chain. did you use SSCAN to delete members?
regarding the fix, maybe it would be possible to improve dictGetSomeKeys, but i suppose long chains would still mean we never sample from the end of the chain.
regarding the fix you're proposing,
1. we should still respect FORBID, we don't want the fork child to do any re-hashing.
2. downsizing happens at x/10, so checking dict_force_resize_ratio (5) seems pointless.
Comment From: sundb
@sundb first, good job reproducing it. i suppose you did it while there is a fork child? secondly, how exactly did you produce that table? i see it's down sizing, and the original (bigger) one is nearly empty (was probably already nearly empty before rehashing), but what i don't understand is how come the new (smaller) one has such a long chain. did you use SSCAN to delete members?
Yes, using SSCAN to delete members.
I did this because I wanted the members of dict[0] to be concentrated at the end as much as possible.
Config:
enable-debug-command yes
save ""
set-max-listpack-entries 0
rdb-key-save-delay 2147483647
Steps: 1) Use python to create a dict just in time to start shrinking and start rehashing.
import redis
import os
def is_rehashing(r):
resp = r.execute_command("debug", "HTSTATS-KEY", "s", "full")
return resp.find("rehashing target") != -1
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
r.flushdb()
pipeline = r.pipeline()
for num in range(0,100000):
pipeline.sadd("s", "m:" + str(num))
pipeline.execute()
while is_rehashing(r):
r.srandmember("s", 100)
r.bgsave()
print(is_rehashing(r))
pipeline = r.pipeline()
count = 0
cursor = 0
while count < 99000:
resp = r.sscan("s", cursor)
cursor = resp[0]
for m in resp[1]:
count += 1
pipeline.srem("s", m)
pipeline.execute()
os.system("pgrep -f redis-rdb-bgsave | xargs kill -9")
while is_rehashing(r) == False:
r.spop("s")
2) Manually execute srandmember s <count> to rehash this dict
to let its rehashidx close to dict[0]'s maxsizemask(131071).
3) BGSAVE to stop rehash.
If at this time, the elements of dict[0] are concentrated after rehashidx, srandmember will have a probability of 0.0078(1024/131071) to randomly find duplicate members all the time.
Theoretically, if there are GETFAIR_NUM_ENTRIES(15) members exactly after the rehashidx, srandmember will probably die loop.
Comment From: sundb
- downsizing happens at x/10, so checking
dict_force_resize_ratio(5) seems pointless.
The resize occurs when (used/size) /10 of dict[0], but dict_force_resize_ratio(5) is between dict_size_0 and dict_size_1, they don't conflict with each other. In the above example: before rehash
table used: 979
table size: 131072
(131072 / 979) /10 = 13.38, start rehash.
start rehash
table 1 size: 131072
table 2 size: 1024
(131072 / 1024) = 138, I think this dict still can rehash.
Comment From: oranagra
Ok. SCAN+DEL is a recipe for trouble (not just for rehashing, but also SCAN, RAND, eviction, etc). But is that really what @Fusl did?
So before #11692 both the fork child and the parent could keep incremental rehashing during fork, and besides blocking the fork child from doing it, I also prevented the parent from doing it unless we need an extreme upscale, but we should have also allowed down scale. Right? The FORBID part should be kept, and this fix should be taken to several versions.
Comment From: sundb
Ok. SCAN+DEL is a recipe for trouble (not just for rehashing, but also SCAN, RAND, eviction, etc). But is that really what @Fusl did?
Because i saw a lot of SSCAN commands in INFO ALL posted in https://github.com/redis/redis/issues/12200#issuecomment-1554476660
cmdstat_srem:calls=1914081,usec=56042921,usec_per_call=29.28,rejected_calls=0,failed_calls=0
cmdstat_sscan:calls=804510,usec=4211523144,usec_per_call=5234.89,rejected_calls=0,failed_calls=0
cmdstat_spop:calls=76303,usec=93211447,usec_per_call=1221.60,rejected_calls=0,failed_calls=0
So before #11692 both the fork child and the parent could keep incremental rehashing during fork, and besides blocking the fork child from doing it, I also prevented the parent from doing it unless we need an extreme upscale, but we should have also allowed down scale. Right?
Yes, This is the only possibility I can think of for dictGetSomeKeys() to return duplicate data.
Another way to fix it is to optimize dictGetSomeKeys(), so that it randomly the slot from [rehashidx, dict_size_2 + dict_size_1 - rehashidx] instead of [0, maxsizemask] (unsigned long i = randomULong() & maxsizemask;), but I'm not sure why antirez didn't do this.
Comment From: Fusl
SSCAN is part of how we maintain items in a set, yes. We do an SSCAN, move that data elsewhere (to offload huge amounts of data to disk) and then SREM those entries. Now that I think about it, we only started seeing the behavior of Redis hanging when we enabled said feature and this one is the only one doing a combination of SSCAN+SREM while the main system is the one running SRANDMEMBER as the primary way to get items from it.
Comment From: oranagra
Another way to fix it is to optimize dictGetSomeKeys(), so that it randomly the slot from [rehashidx, dict_size_2 + dict_size_1 - rehashidx] instead of [0, maxsizemask] (
unsigned long i = randomULong() & maxsizemask;), but I'm not sure why antirez didn't do this.
i think that when we have such a long chain in a bucket, we have no chance to ever pick the items at the end, so this pathological issue isn't gonna be solved by any change to the bucket selection.
SSCANis part of how we maintain items in a set, yes. We do anSSCAN, move that data elsewhere (to offload huge amounts of data to disk) and thenSREMthose entries. Now that I think about it, we only started seeing the behavior of Redis hanging when we enabled said feature and this one is the only one doing a combination ofSSCAN+SREMwhile the main system is the one runningSRANDMEMBERas the primary way to get items from it.
@Fusl so did you use this pattern on older versions (before 7.0.8)? or just on the on later ones? in any case, i advise to avoid this pattern, redis really doesn't like it.
Comment From: Fusl
i advise to avoid this pattern, redis really doesn't like it.
I'm aware of that now but we have to do it this way due to how we store data in sets. Even if it is advised to avoid it, it shouldn't cause Redis to hang forever. A possible workaround for us would be to use negative numbers in srandmember to make it not try to find unique entries and allow duplicates to be returned (we'd be handing duplicates within our application then) but I don't know what the downsides are here internally inside Redis.
did you use this pattern on older versions
Yes, we used the same pattern before on older versions as well without Redis hanging, although never at the scale that we've done so recently.