There appears to no eviction happening during evalsha (as well as eval) commands even with allkeys-lru set. This is causing OOM errors while running scripts.
Repo:
Redis server was set with:
maxmemory 10000000
maxmemory-policy allkeys-lru
We will use a simple script to insert data:
local function commit()
local key = ARGV[1]
local value = ARGV[2]
redis.call('HSET', key, 'value', value)
end
return commit()
Insert the script:
echo -E "script load \"local function commit()\n local key = ARGV[1]\n local value = ARGV[2]\n redis.call('HSET', key, 'value', value)\nend\n\nreturn commit()\"" | redis-cli --pipe
Than populate the db:
for i in {1..100}; do; redis-cli evalsha ac5b6905f1d2186b86ee8868aaacb38eda65596e 0 "$i" "$(head -c 100000000 /dev/urandom | tr -dc A-Za-z0-9 | head -c 100000 ; echo '')"; done
You should see a few keys being added, followed by the max memory being hit:
(error) ERR Error running script (call to f_ac5b6905f1d2186b86ee8868aaacb38eda65596e): @user_script:4: @user_script: 4: -OOM command not allowed when used memory > 'maxmemory'.
Expected:
The expected result is the same as other hset calls, to evict keys based on the eviction policy. The issue appears to be in scripting.c throwing the exception instead of attempting to evict keys using freeMemoryIfNeededAndSafe()
I attempted to add that call to line 527 before pushing the error and it appears to fix the issue, however I worry that it may not exist because of some conflict with replication.
Comment From: brian-mcnamara
Appears to be related to #5250 and https://github.com/antirez/redis/commit/c6c71abe55ef2cbd02eea389b66f4f37c69c029b. I know 5.0 command replication is the default (over script replication), so would a check for whether command replication is enabled to call to freeMemory. I may be missing a side case, but since scripts are not replicated, the master/slave should be kept consistent.
Comment From: antirez
That looks like a bug indeed. Even when keys are not evicted, AFAIK the correct semantics should be to abort only immediately if there is an OOM condition when the script is called, or never abort till the end of the script. I'll check this, seems important. Thanks for reporting.
Comment From: kalenp
Running into this same issue using using Redisson client library which makes heavy use of Lua scripting. In particular, we're using the BinaryStream wrapper which invokes redis.call('append'...) which fails during OOM conditions instead of evicting other keys.
Comment From: alexander-akhmetov
We ran into the same issue with 5.0 and it looks like in v4 eviction during Lua scripts execution is working fine.
Appears to be related to #5250 and c6c71ab. I know 5.0 command replication is the default (over script replication), so would a check for whether command replication is enabled to call to freeMemory. I may be missing a side case, but since scripts are not replicated, the master/slave should be kept consistent.
Does it mean that we can run into the situation described in c6c71ab with v4.0?
Comment From: brian-mcnamara
Yes, the issue described in that commit can happen in 4 (the changes were part of version 5). That being said, if you enable command replication in version 4, I believe you will prevent running into that bug.
Comment From: patpatbear
@antirez I think this is the borderline issue mentioned by @oranagra in
https://github.com/antirez/redis/pull/5250#issuecomment-416050622
proof:
I added some log in redis-server, repo this issue using script provided by @brian-mcnamara :
6211:M 19 Jan 2020 11:31:47.408 # + start process command:evalsha
6211:M 19 Jan 2020 11:31:47.408 # getMaxmemoryState:OK, used:9923008, maxmmeory:10000000, tofree:0 // time 1
6211:M 19 Jan 2020 11:31:47.408 # + start call command:evalsha // time 2
6211:M 19 Jan 2020 11:31:47.408 # getMaxmemoryState:ERR, used:10037712, maxmmeory:10000000, tofree:37712 // time 3
6211:M 19 Jan 2020 11:31:47.409 # - end call command:evalsha // time 4
6211:M 19 Jan 2020 11:31:47.409 # - end process command:evalsha
The results show that the error was caused by memory borderline:
time 1, evalsha was called, used memory is slightly under maxmemory. maxmemory checked OK, no eviction happened. time 2, lua env was created, some amount of memmory consumed, (~4kb for stack, ~100kb for args), used memory is slightty beyond maxmemory. time 3, lua start to call hset command, maxmemory was checked ERR, eviction was not allowed intentionally(due to consistency issue, discussed in #5250); time 4, lua env was freed, used memory back to below maxmemory again.
I looked into #5250 and found out that other issues are solved, the borderline problem were omitted.
proposal:
IMHO we can solve this by:
- detect and save OOM state into
server.lua_oomat script start - check
server.lua_oomuntill firstDENYOOMcommand with script
so that:
lua_oomstate would be consistent, and not interfered by memory used by lua.- memory will not grow uncontrolled by eval/evalsha (because we checked
lua_oomwithin script)
I'd like to know your opinions @antirez @soloestoy @oranagra .
Comment From: oranagra
@antirez @soloestoy @guybe7 i think we should solve this problem ASAP.
I see two options, one is #6797 which was actually proposed by Salvatore in https://github.com/antirez/redis/pull/5250#issuecomment-416173111.
Alternatively, since we stopped replicating scripts, and we now only replicate their effects, maybe it's time to re-introduce eviction to scripts, and possibly also trim all the code that deals with script replication from redis 6.0?
Comment From: antirez
@oranagra thank you, I merged #6797, looks like the safer bet 24 days before GA :-) Thanks.
Comment From: asgeirrr
Hi, I'd like to report I've replicated this in Redis 5.0.8. We have a very similar configuration to @brian-mcnamara i.e.
appendonly yes
aof-use-rdb-preamble yes
auto-aof-rewrite-min-size 64mb
maxmemory 1gb
maxmemory-policy allkeys-lru
we're also running user-defined Lua scripts. The following errors started to happen after migration from 4.0.1 to 5.0.8:
Error running script (call to f_4d6cd7ce9e84598704a316c7e78198d5f8e93565): @user_script:12: @user_script: 12: -OOM command not allowed when used memory > 'maxmemory'.
Is there a chance the fix will get backported to the 5.0.x line? Thanks.
Comment From: antirez
Hello @asgeirrr, I backported the fix to 5.0 (already in the branch), and will push a new release ASAP.
Comment From: guybe7
@antirez this is a reminder that before releasing another 5.0 we should tackle https://github.com/antirez/redis/issues/7105 - the minimum is to have streamPropagateGroupID use propagate instead of alsoPropagate
Comment From: antirez
@guybe7 thank you, you are right, I commented in the issue. Maybe we can fix it right away.
Comment From: kalenp
It looks like this was backported to the 5.0 branch, but never released, since it came in after 5.0.9. Is there a plan to make another 5.0 release with this fix in it?
Comment From: oranagra
in general we're waiting for some critical issue that will justify another release of 5.0, and in the meanwhile collecting a list of less critical issues to be backported when that happens. maybe this issue justifies a release on it's own, i'll try to gather a list of candidates and decide.
Comment From: kalenp
Thank you for the update.
Comment From: schmidp
we are also running into this problem and would be happy about a new release.
Comment From: guybe7
@schmidp IIUC the fix is in 5.0.10
Comment From: oranagra
Thanks @guybe7 . I meant to post here but forgot..