Describe the bug
redis-server --version
Redis server v=7.0.2 sha=00000000:0 malloc=jemalloc-5.2.1 bits=64 build=15c2c3973ed24647
free -h
total used free shared buff/cache available
Mem: 7.8Gi 231Mi 7.0Gi 1.0Mi 547Mi 7.3Gi
Swap: 1.8Gi 0B 1.8Gi
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04 LTS
Release: 22.04
Codename: jammy
make test
Cleanup: may take some time... OK
Starting test server at port 21079
...
...
...
[1;31;49m!!! WARNING[0m The following tests failed:
*** [[0;31;49merr[0m]: Active defrag in tests/unit/memefficiency.tcl
Expected 72 <= 30 (context: type eval line 113 cmd {assert {$max_latency <= 30}} proc ::test)
Cleanup: may take some time... OK
make[1]: *** [Makefile:427: test] Error 1
make[1]: Leaving directory '/opt/redis-stable/src'
make: *** [Makefile:6: test] Error 2
To reproduce
make test
Expected behavior
\o/ All tests passed without errors!
Additional information
I have noticed that in a similar issue #10053 reported was asked to execute the following ./runtest --single unit/memefficiency --verbose output included below.
Cleanup: may take some time... OK
Starting test server at port 21079
[ready]: 24144
[1;37;49mTesting unit/memefficiency[0m
[ready]: 24143
[ready]: 24145
[ready]: 24146
[ready]: 24147
[ready]: 24148
[ready]: 24149
[ready]: 24150
[ready]: 24151
[ready]: 24152
[ready]: 24153
[ready]: 24155
[ready]: 24156
[ready]: 24157
[ready]: 24154
[ready]: 24158
=== (memefficiency external:skip) Starting server 127.0.0.1:21611 ok
[[0;32;49mok[0m]: Memory efficiency with values in range 32 (154 ms)
[[0;32;49mok[0m]: Memory efficiency with values in range 64 (153 ms)
[[0;32;49mok[0m]: Memory efficiency with values in range 128 (157 ms)
[[0;32;49mok[0m]: Memory efficiency with values in range 1024 (190 ms)
[[0;32;49mok[0m]: Memory efficiency with values in range 16384 (557 ms)
[1/1 [0;33;49mdone[0m]: unit/memefficiency (1 seconds)
[1;37;49mTesting solo test[0m
=== (defrag external:skip) Starting server 127.0.0.1:21612 ok
frag 1.52
frag 1.02
hits: 393676
misses: 2141633
max latency 7
{active-defrag-cycle 1657221160 7 7} {command 1657221157 889 889}
{1657221158 7} {1657221159 7} {1657221160 7}
AOF loading:
frag 1.05
hits: 1360602
misses: 8892575
max latency 176
{while-blocked-cron 1657221183 85 176} {active-defrag-cycle 1657221183 7 7} {command 1657221183 21993 21993}
{1657221171 39} {1657221172 39} {1657221173 143} {1657221174 52} {1657221175 124} {1657221176 53} {1657221177 91} {1657221178 33} {1657221179 176} {1657221181 20} {1657221182 104} {1657221183 85}
[[0;31;49merr[0m]: Active defrag in tests/unit/memefficiency.tcl
Expected 176 <= 30 (context: type eval line 113 cmd {assert {$max_latency <= 30}} proc ::test)
used 54570536
rss 54878208
frag 1.01
frag_bytes 307672
used 29315448
rss 54050816
frag 1.84
frag_bytes 24735368
used 29356232
rss 30638080
frag 1.04
frag_bytes 1281848
[[0;32;49mok[0m]: Active defrag eval scripts (3830 ms)
frag 1.86
frag 1.03
hits: 499087
misses: 1509516
max latency 8
{active-defrag-cycle 1657221200 7 8} {command 1657221198 729 729}
{1657221198 8} {1657221199 8} {1657221200 7}
[[0;32;49mok[0m]: Active defrag big keys (13727 ms)
frag 2.00
frag 1.02
misses: 101247
hits: 98756
max latency 8
{active-defrag-cycle 1657221214 8 8} {command 1657221213 335 335}
{1657221214 8}
[[0;32;49mok[0m]: Active defrag big list (13411 ms)
=== (defrag) Starting server 127.0.0.1:21613 ok
frag 1.96
frag 1.05
hits: 626861
misses: 1716405
[[0;32;49mok[0m]: Active defrag edge case (17771 ms)
[1/1 [0;33;49mdone[0m]: defrag (87 seconds)
The End
Execution time of different units:
1 seconds - unit/memefficiency
87 seconds - defrag
[1;31;49m!!! WARNING[0m The following tests failed:
*** [[0;31;49merr[0m]: Active defrag in tests/unit/memefficiency.tcl
Expected 176 <= 30 (context: type eval line 113 cmd {assert {$max_latency <= 30}} proc ::test)
Cleanup: may take some time... OK
I have tried building version 6.2.7 on the same machine, make test is successful
Comment From: oranagra
Looks like timing / threshold issue (although the gap seem high) . Is it consistent? maybe the machine was busy at that time.
Comment From: judeng
Are you using WSL?
Comment From: 0xD4n
@oranagra it fails every single time I run the test, I'll attach output from another run below.
The output included in my original post was generated in a vbox vm, but I am also able to re-create it on the host.
I'm running this on a bare metal machine with
AMD Ryzen 9 5900X 12-Core Processor
load average: 0.14, 0.29, 0.22
total used free shared buff/cache available
Mem: 125Gi 2.9Gi 119Gi 60Mi 3.1Gi 121Gi
output from another run ./runtest --single unit/memefficiency --verbose
Cleanup: may take some time... OK
Starting test server at port 21079
[ready]: 5256
Testing unit/memefficiency
[ready]: 5258
[ready]: 5257
[ready]: 5259
[ready]: 5260
[ready]: 5261
[ready]: 5262
[ready]: 5263
[ready]: 5264
[ready]: 5265
[ready]: 5266
[ready]: 5267
[ready]: 5269
[ready]: 5268
[ready]: 5271
[ready]: 5270
=== (memefficiency external:skip) Starting server 127.0.0.1:21111 ok
[ok]: Memory efficiency with values in range 32 (147 ms)
[ok]: Memory efficiency with values in range 64 (132 ms)
[ok]: Memory efficiency with values in range 128 (133 ms)
[ok]: Memory efficiency with values in range 1024 (151 ms)
[ok]: Memory efficiency with values in range 16384 (449 ms)
[1/1 done]: unit/memefficiency (1 seconds)
Testing solo test
=== (defrag external:skip) Starting server 127.0.0.1:21112 ok
frag 1.52
frag 1.02
hits: 377763
misses: 2165981
max latency 7
{active-defrag-cycle 1657835794 7 7} {command 1657835791 762 762}
{1657835792 7} {1657835793 7} {1657835794 7}
AOF loading:
frag 1.52
hits: 0
misses: 0
max latency 0
{active-defrag-cycle 1657835794 7 7} {command 1657835798 2745 2745}
[err]: Active defrag in tests/unit/memefficiency.tcl
Expected 0 > 100000 (context: type eval line 106 cmd {assert {$hits > 100000}} proc ::test)
used 54563040
rss 54910976
frag 1.01
frag_bytes 347936
used 29336704
rss 54042624
frag 1.84
frag_bytes 24705920
used 29372000
rss 30605312
frag 1.04
frag_bytes 1233312
[ok]: Active defrag eval scripts (2601 ms)
frag 1.86
frag 1.03
hits: 498676
misses: 1510522
max latency 8
{active-defrag-cycle 1657835812 8 8} {command 1657835811 646 646}
{1657835811 8} {1657835812 8}
[ok]: Active defrag big keys (12693 ms)
frag 1.99
frag 1.01
misses: 101263
hits: 98740
max latency 7
{active-defrag-cycle 1657835825 7 7} {command 1657835825 297 297}
{1657835825 7}
[ok]: Active defrag big list (12314 ms)
=== (defrag) Starting server 127.0.0.1:21113 ok
frag 1.96
frag 1.05
hits: 625759
misses: 1718340
[ok]: Active defrag edge case (16879 ms)
[1/1 done]: defrag (62 seconds)
The End
Execution time of different units:
1 seconds - unit/memefficiency
62 seconds - defrag
!!! WARNING The following tests failed:
*** [err]: Active defrag in tests/unit/memefficiency.tcl
Expected 0 > 100000 (context: type eval line 106 cmd {assert {$hits > 100000}} proc ::test)
Cleanup: may take some time... OK
Comment From: reybog90
I don't know, if it's related, but I am also seeing intermittent Active defrag test failures on 6.2.7 running on a somewhat occupied bare-metal i7-7700K CPU @ 4.20GHz. So far that test fails about 10% of the time. Are these tests expected to fail on busy machines?
!!! WARNING The following tests failed:
*** [err]: Active defrag in tests/unit/memefficiency.tcl
Expected 1.52 < 1.4 (context: type eval line 109 cmd {assert {$frag < 1.4}} proc ::test)
$ ./src/redis-server --version
Redis server v=6.2.7 sha=e6f67092:0 malloc=jemalloc-5.1.0 bits=64 build=d8ac2d77249547e3
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
$ free -h
total used free shared buff/cache available
Mem: 31G 14G 13G 204M 3,3G 16G
Swap: 2,0G 326M 1,7G
$ uptime
15:34:19 up 32 days, 3:08, 1 user, load average: 1,39, 1,83, 3,00
Comment From: oranagra
These tests are a bit sensitive, they're expected to fail on a ver busy or slow machine.. We should probably improve, but it's complicated..
Note that the first failure mentioned here is due to a latency threshold, the second one is not seeing enough defragmentation, and the third is about defrag not reaching its target threshold.
Comment From: bsdelf
I have same issue for both 7.0.5 and unstable, the wierd thing is, the unit test fails in tmux but passes in konsole.
CPU: 6-core 12th Gen Intel Core i5-12500 (-MT MCP-) speed/min/max: 3343/800/5900 MHz
Kernel: 6.0.10_1 x86_64
Mem: 5330.5/64093.2 MiB (8.3%)
% ./src/redis-server --version
Redis server v=7.0.5 sha=1571907e:0 malloc=jemalloc-5.2.1 bits=64 build=bed131ec13dc8ff6
% gcc --version
gcc (GCC) 10.2.1 20201203
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
% lsb_release -a
LSB Version: 1.0
Distributor ID: VoidLinux
Description: Void Linux
Release: rolling
Codename: void
redis git:(7.0.5) ? % ./runtest --single unit/memefficiency --verbose
Cleanup: may take some time... OK
Starting test server at port 21079
[ready]: 16464
Testing unit/memefficiency
[ready]: 16465
[ready]: 16463
[ready]: 16466
[ready]: 16467
[ready]: 16468
[ready]: 16469
[ready]: 16470
[ready]: 16471
[ready]: 16472
[ready]: 16473
[ready]: 16474
[ready]: 16475
[ready]: 16478
[ready]: 16477
[ready]: 16476
=== (memefficiency external:skip) Starting server 127.0.0.1:21611 ok
[ok]: Memory efficiency with values in range 32 (109 ms)
[ok]: Memory efficiency with values in range 64 (109 ms)
[ok]: Memory efficiency with values in range 128 (110 ms)
[ok]: Memory efficiency with values in range 1024 (118 ms)
[ok]: Memory efficiency with values in range 16384 (257 ms)
[1/1 done]: unit/memefficiency (1 seconds)
Testing solo test
=== (defrag external:skip) Starting server 127.0.0.1:21612 ok
frag 1.52
frag 1.04
hits: 372063
misses: 2181656
max latency 7
{command 1670386874 465 465} {active-defrag-cycle 1670386878 7 7}
{1670386875 7} {1670386876 7} {1670386877 7} {1670386878 7}
AOF loading:
frag 1.52
hits: 0
misses: 0
max latency 0
{command 1670386882 2723 2723} {active-defrag-cycle 1670386878 7 7}
[err]: Active defrag in tests/unit/memefficiency.tcl
Expected 0 > 100000 (context: type eval line 106 cmd {assert {$hits > 100000}} proc ::test)
used 54309504
rss 54632448
frag 1.01
frag_bytes 322944
used 29064688
rss 53747712
frag 1.85
frag_bytes 24683024
used 29100352
rss 30318592
frag 1.04
frag_bytes 1218240
[ok]: Active defrag eval scripts (2601 ms)
frag 1.87
frag 1.03
hits: 499297
misses: 1509532
max latency 8
{command 1670386893 386 386} {active-defrag-cycle 1670386896 6 8}
{1670386893 8} {1670386894 8} {1670386895 8} {1670386896 6}
[ok]: Active defrag big keys (11916 ms)
frag 2.00
frag 1.01
misses: 101248
hits: 98755
max latency 8
{command 1670386906 183 183} {active-defrag-cycle 1670386907 8 8}
{1670386907 8}
[ok]: Active defrag big list (11490 ms)
=== (defrag) Starting server 127.0.0.1:21613 ok
frag 1.96
frag 1.05
hits: 627696
misses: 1715533
[ok]: Active defrag edge case (15351 ms)
[1/1 done]: defrag (58 seconds)
The End
Execution time of different units:
1 seconds - unit/memefficiency
58 seconds - defrag
!!! WARNING The following tests failed:
*** [err]: Active defrag in tests/unit/memefficiency.tcl
Expected 0 > 100000 (context: type eval line 106 cmd {assert {$hits > 100000}} proc ::test)
Cleanup: may take some time... OK
Comment From: oranagra
@bsdelf please try to increase the play with the following line and see if it helps (e.g. try -200)
r config set key-load-delay -50 ;# sleep on average 1/50 usec
Comment From: bsdelf
Change to -200 or even -2000 does not help, I've tried several times. Run in konsole can also fail, so terminal emulator is irrelevant. But I do observe, if add some constant loads to system instead of idle before running that test case, the success rate increases.
Comment From: oranagra
i'm sorry, i mixed it up, please try to make it lower. e.g. -25 or -5 (effectively making AOF loading slower, and giving the defragger a chance to get triggered and run)
Comment From: bsdelf
It works, thanks.
Comment From: oranagra
so i suppose your machine is too fast for that test. i'll consider changing the threshold if we see it again.
Comment From: oranagra
actually, i just noticed that the person who opened this issue had this failure too (among others), so this is the second time we see it. @bsdelf please tell me which value you used that seemed to consistently work.
Comment From: bsdelf
-40 is good.
For anyone who wants to find out the optimal parameter
a.js
const fs = require('fs');
const { spawn } = require('child_process');
async function delay(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
function tune() {
const step = 5;
const testFilePath = 'tests/unit/memefficiency.tcl';
const pattern =
/r config set key-load-delay -(?<value>\d+) ;# sleep on average/;
const origData = fs.readFileSync(testFilePath, 'utf-8');
const matched = origData.match(pattern);
const line = matched[0];
const value = Number.parseInt(matched.groups.value);
if (!matched || !line || !Number.isInteger(value)) {
throw new Error('pattern not found');
}
console.log('orignal:', { line, value });
const newValue = value - step;
const newLine = line.replace(`${value}`, `${newValue}`);
console.log('patched:', { value: newValue, line: newLine });
const newData = origData.replaceAll(line, newLine);
fs.writeFileSync(testFilePath, newData, { encoding: 'utf-8' });
}
async function loop() {
let stop = true;
const cp = spawn('./runtest', ['--single', 'unit/memefficiency'], {
detached: true,
});
for (const event of ['SIGINT', 'uncaughtException', 'unhandledRejection']) {
process.on(event, () => {
process.kill(-cp.pid);
});
}
cp.stdout.on('data', (data) => {
const text = data.toString();
console.log(text);
if (/\[.*err.*\]:/.test(text)) {
stop = false;
process.kill(-cp.pid);
}
});
cp.stderr.on('data', (data) => {
console.log(data.toString());
stop = true;
process.kill(-cp.pid);
});
cp.on('close', async (code) => {
console.log(`child process exited with code: ${code}`);
if (!stop) {
tune();
await delay(6000);
loop();
}
});
}
(async () => {
try {
await loop();
} catch (err) {
console.log(err);
}
})();
start iteration and wait
for run in {1..10}; do node a.js; done
check result
git diff
Comment From: oranagra
ok, i'll aim for -25 to keep a safe distance.