description
When trying to run make test on a current stable release 6.2.5, got
[err]: Active defrag in tests/unit/memefficiency.tcl
defrag didn't stop.
problem.
I tried ./runtest --single unit/memefficiency, and got the same result. Other tests are all passed.
My device is Raspberry Pi 3b and running Manjaro ARM Minimal.
log of ./runtest --single unit/memefficiency --verbose --no-latency --only "Active defrag": run.log
lscpu
架构: aarch64
CPU 运行模式: 32-bit, 64-bit
字节序: Little Endian
CPU: 4
在线 CPU 列表: 0-3
厂商 ID: ARM
型号名称: Cortex-A53
型号: 4
每个核的线程数: 1
Core(s) per cluster: 4
座: -
Cluster(s): 1
步进: r0p4
CPU 最大 MHz: 1200.0000
CPU 最小 MHz: 600.0000
BogoMIPS: 38.40
标记: fp asimd evtstrm crc32 cpuid
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Not affected
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
Comment From: enjoy-binbin
Is this test always failed in your machine? The test indeed had some problems... which has failed several times in our daily CI at some times, see: https://github.com/redis/redis/pull/7289#issuecomment-885431100
Comment From: KernelErr
Is this test always failed in your machine? The test indeed had some problems... which has failed several times in our daily CI at some times, see: #7289 (comment)
That's strange lol. In last night and today's morning, I tried at least five times and got the same error. But after your reply, it works without error.
Testing solo test
=== (defrag) Starting server 127.0.0.1:21112 ok
frag 1.52
frag 1.04
hits: 299858
misses: 2290312
max latency 9
{command 1630037541 7481 7481} {active-defrag-cycle 1630037555 7 9}
{1630037542 7} {1630037543 7} {1630037544 9} {1630037545 7} {1630037546 8} {1630037547 7} {1630037548 7} {1630037549 7} {1630037550 7} {1630037551 7} {1630037552 7} {1630037553 7} {1630037554 7} {1630037555 7}
AOF loading:
frag 1.08
hits: 559647
misses: 5585973
max latency 0
{command 1630037599 32959 32959} {while-blocked-cron 1630037598 33 40} {active-defrag-cycle 1630037598 7 9}
Comment From: KernelErr
By the way, I've compiled the source code again before this successful test.
Comment From: oranagra
This doesn't seem to be the same issue as the one discussed in the other issue.
First it's not the same test, (this one is the plain Active defrag test, and there it's the new Active defrag edge case.
Secondly, there it's some stagnation error that happens when the jemalloc bin util is exactly 0.5.
Anyway, I think that in this case it's just a matter of insufficient time.
So you can probably reproduce it again by changing the wait_for_condition in that test to be slightly shorter, and then we can maybe conclude what's the right value to use to be sure it always passes.
Comment From: enjoy-binbin
this one is the plain Active defrag test
oh... sorry, i indeed run ./runtest --single unit/memefficiency --verbose --no-latency --only "Active defrag"
now we supoort re in --only, and in this case, it will hit everyting that begin with Active defrag.
so i ran it and then I took it for granted that it was the same mistake..
and didn't bother to look at the log..
Active defrag: wait_for_condition 150 100, it only take 3-6 seconds in my machine
Active defrag big list: wait_for_condition 500 100, it only take 1 second in my machine
but i do indeed got an error once (defrag big list):
Execution time of different units:
1 seconds - unit/memefficiency
373 seconds - defrag
!!! WARNING The following tests failed:
*** [err]: Active defrag big list in tests/unit/memefficiency.tcl
defrag didn't stop.
byw, in unstable branch, we supoort --only with re
so in this case, if we in unstable branch, we should use:
- ./runtest --single unit/memefficiency --verbose --no-latency --only "Active defrag big list$"
- ./runtest --single unit/memefficiency --verbose --no-latency --only "Active defrag$"
Comment From: oranagra
@KernelErr i suppose Raspberry Pi is just a bit too slow for these thresholds.
please try to modify the wait_for_condition 150 100 in the Active defrag in tests/unit/memefficiency.tcl to something like wait_for_condition 500 100. run the test a couple of times (the using the last line in the post above), and tell us how much time it takes to complete (assuming it'll succeed)
Comment From: KernelErr
@KernelErr i suppose Raspberry Pi is just a bit too slow for these thresholds. please try to modify the
wait_for_condition 150 100in theActive defragintests/unit/memefficiency.tclto something likewait_for_condition 500 100. run the test a couple of times (the using the last line in the post above), and tell us how much time it takes to complete (assuming it'll succeed)
Okay, I will run these tests later today.
Comment From: KernelErr
In the unstable branch, I performed the test after altering the wait_for_condition 150 100 to wait_for_condition 500 100.
./runtest --single unit/memefficiency --verbose --no-latency --only "Active defrag$"
It successfully passed and here's the time consumed(five times):
Execution time of different units:
0 seconds - unit/memefficiency
170 seconds - defrag
Execution time of different units:
1 seconds - unit/memefficiency
170 seconds - defrag
Execution time of different units:
1 seconds - unit/memefficiency
171 seconds - defrag
Execution time of different units:
0 seconds - unit/memefficiency
170 seconds - defrag
Execution time of different units:
1 seconds - unit/memefficiency
169 seconds - defrag
Comment From: oranagra
thanks. so it's clear that 150*100 (15 seconds) was no were near enough. was this the only one that failed? or are there others that need adjustment?
Comment From: huangzhw
At my machine this test always fails.
!!! WARNING The following tests failed:
*** [err]: Active defrag in tests/unit/memefficiency.tcl
Expected 46 <= 30 (context: type eval line 68 cmd {assert {$max_latency <= 30}} proc ::test)
*** [err]: Active defrag big list in tests/unit/memefficiency.tcl
defrag didn't stop.
Comment From: KernelErr
I ran make test on the unstable branch and it passed all tests(with memefficiency modified).
Execution time of different units:
0 seconds - unit/printver
63 seconds - unit/dump
2 seconds - unit/auth
1 seconds - unit/protocol
3 seconds - unit/keyspace
36 seconds - unit/scan
1 seconds - unit/info
27 seconds - unit/type/string
0 seconds - unit/type/incr
11 seconds - unit/type/list
56 seconds - unit/type/list-2
85 seconds - unit/type/list-3
25 seconds - unit/type/set
81 seconds - unit/type/zset
18 seconds - unit/type/hash
161 seconds - unit/type/stream
6 seconds - unit/type/stream-cgroups
58 seconds - unit/sort
17 seconds - unit/expire
25 seconds - unit/other
4 seconds - unit/multi
1 seconds - unit/quit
105 seconds - unit/aofrw
2 seconds - unit/acl
23 seconds - unit/latency-monitor
27 seconds - integration/block-repl
501 seconds - integration/replication
31 seconds - integration/replication-2
99 seconds - integration/replication-3
83 seconds - integration/replication-4
182 seconds - integration/replication-psync
11 seconds - integration/aof
29 seconds - integration/rdb
32 seconds - integration/corrupt-dump
21 seconds - integration/corrupt-dump-fuzzer
1 seconds - integration/convert-zipmap-hash-on-load
1 seconds - integration/convert-ziplist-hash-on-load
6 seconds - integration/logging
52 seconds - integration/psync2
22 seconds - integration/psync2-reg
20 seconds - integration/psync2-pingoff
7 seconds - integration/failover
18 seconds - integration/redis-cli
4 seconds - integration/redis-benchmark
8 seconds - integration/dismiss-mem
1 seconds - unit/pubsub
2 seconds - unit/slowlog
21 seconds - unit/scripting
176 seconds - unit/maxmemory
4 seconds - unit/introspection
7 seconds - unit/introspection-2
1 seconds - unit/limits
50 seconds - unit/obuf-limits
12 seconds - unit/bitops
4 seconds - unit/bitfield
251 seconds - unit/geo
12 seconds - unit/memefficiency
177 seconds - unit/hyperloglog
3 seconds - unit/lazyfree
5 seconds - unit/wait
1 seconds - unit/pause
3 seconds - unit/querybuf
21 seconds - unit/pendingquerybuf
0 seconds - unit/tls
3 seconds - unit/tracking
2 seconds - unit/oom-score-adj
1 seconds - unit/shutdown
3 seconds - unit/networking
607 seconds - defrag
\o/ All tests passed without errors!
Seems that it's ok for Raspberry Pi 3B with Manjaro installed.
Comment From: oranagra
@huangzhw the first error there is unrealiable latency measurement. you can use --no-latency to hide it.
the second problem is the one mentioned here: https://github.com/redis/redis/pull/7289#issuecomment-885431100 i don't know how to resolve it yet.
if it happens a lot we may eventually decide to disable the test.
i.e. i invested a lot of effort to write a test that can consistently reproduce a rare issue, thought that i solve it, but it turns out i didn't.