There appears to no eviction happening during evalsha (as well as eval) commands even with allkeys-lru set. This is causing OOM errors while running scripts.

Repo:

Redis server was set with:

maxmemory 10000000
maxmemory-policy allkeys-lru

We will use a simple script to insert data:

local function commit()
        local key = ARGV[1]
        local value = ARGV[2]
        redis.call('HSET', key, 'value', value)
end

return commit()

Insert the script: echo -E "script load \"local function commit()\n local key = ARGV[1]\n local value = ARGV[2]\n redis.call('HSET', key, 'value', value)\nend\n\nreturn commit()\"" | redis-cli --pipe

Than populate the db: for i in {1..100}; do; redis-cli evalsha ac5b6905f1d2186b86ee8868aaacb38eda65596e 0 "$i" "$(head -c 100000000 /dev/urandom | tr -dc A-Za-z0-9 | head -c 100000 ; echo '')"; done

You should see a few keys being added, followed by the max memory being hit: (error) ERR Error running script (call to f_ac5b6905f1d2186b86ee8868aaacb38eda65596e): @user_script:4: @user_script: 4: -OOM command not allowed when used memory > 'maxmemory'.

Expected:

The expected result is the same as other hset calls, to evict keys based on the eviction policy. The issue appears to be in scripting.c throwing the exception instead of attempting to evict keys using freeMemoryIfNeededAndSafe()

I attempted to add that call to line 527 before pushing the error and it appears to fix the issue, however I worry that it may not exist because of some conflict with replication.

Comment From: brian-mcnamara

Appears to be related to #5250 and https://github.com/antirez/redis/commit/c6c71abe55ef2cbd02eea389b66f4f37c69c029b. I know 5.0 command replication is the default (over script replication), so would a check for whether command replication is enabled to call to freeMemory. I may be missing a side case, but since scripts are not replicated, the master/slave should be kept consistent.

Comment From: antirez

That looks like a bug indeed. Even when keys are not evicted, AFAIK the correct semantics should be to abort only immediately if there is an OOM condition when the script is called, or never abort till the end of the script. I'll check this, seems important. Thanks for reporting.

Comment From: kalenp

Running into this same issue using using Redisson client library which makes heavy use of Lua scripting. In particular, we're using the BinaryStream wrapper which invokes redis.call('append'...) which fails during OOM conditions instead of evicting other keys.

Comment From: alexander-akhmetov

We ran into the same issue with 5.0 and it looks like in v4 eviction during Lua scripts execution is working fine.

Appears to be related to #5250 and c6c71ab. I know 5.0 command replication is the default (over script replication), so would a check for whether command replication is enabled to call to freeMemory. I may be missing a side case, but since scripts are not replicated, the master/slave should be kept consistent.

Does it mean that we can run into the situation described in c6c71ab with v4.0?

Comment From: brian-mcnamara

Yes, the issue described in that commit can happen in 4 (the changes were part of version 5). That being said, if you enable command replication in version 4, I believe you will prevent running into that bug.

Comment From: patpatbear

@antirez I think this is the borderline issue mentioned by @oranagra in https://github.com/antirez/redis/pull/5250#issuecomment-416050622

proof:

I added some log in redis-server, repo this issue using script provided by @brian-mcnamara :

6211:M 19 Jan 2020 11:31:47.408 # + start process command:evalsha
6211:M 19 Jan 2020 11:31:47.408 #     getMaxmemoryState:OK, used:9923008, maxmmeory:10000000, tofree:0              // time 1
6211:M 19 Jan 2020 11:31:47.408 #  + start call command:evalsha                                                     // time 2
6211:M 19 Jan 2020 11:31:47.408 #     getMaxmemoryState:ERR, used:10037712, maxmmeory:10000000, tofree:37712        // time 3
6211:M 19 Jan 2020 11:31:47.409 #  - end call command:evalsha                                                       // time 4
6211:M 19 Jan 2020 11:31:47.409 # - end process command:evalsha

The results show that the error was caused by memory borderline:

time 1, evalsha was called, used memory is slightly under maxmemory. maxmemory checked OK, no eviction happened. time 2, lua env was created, some amount of memmory consumed, (~4kb for stack, ~100kb for args), used memory is slightty beyond maxmemory. time 3, lua start to call hset command, maxmemory was checked ERR, eviction was not allowed intentionally(due to consistency issue, discussed in #5250); time 4, lua env was freed, used memory back to below maxmemory again.

I looked into #5250 and found out that other issues are solved, the borderline problem were omitted.

proposal:

IMHO we can solve this by:

  • detect and save OOM state into server.lua_oom at script start
  • check server.lua_oom untill first DENYOOM command with script

so that:

  • lua_oom state would be consistent, and not interfered by memory used by lua.
  • memory will not grow uncontrolled by eval/evalsha (because we checked lua_oom within script)

I'd like to know your opinions @antirez @soloestoy @oranagra .

Comment From: oranagra

@antirez @soloestoy @guybe7 i think we should solve this problem ASAP.

I see two options, one is #6797 which was actually proposed by Salvatore in https://github.com/antirez/redis/pull/5250#issuecomment-416173111.

Alternatively, since we stopped replicating scripts, and we now only replicate their effects, maybe it's time to re-introduce eviction to scripts, and possibly also trim all the code that deals with script replication from redis 6.0?

Comment From: antirez

@oranagra thank you, I merged #6797, looks like the safer bet 24 days before GA :-) Thanks.

Comment From: asgeirrr

Hi, I'd like to report I've replicated this in Redis 5.0.8. We have a very similar configuration to @brian-mcnamara i.e.

appendonly yes
aof-use-rdb-preamble yes
auto-aof-rewrite-min-size 64mb
maxmemory 1gb
maxmemory-policy allkeys-lru

we're also running user-defined Lua scripts. The following errors started to happen after migration from 4.0.1 to 5.0.8:

Error running script (call to f_4d6cd7ce9e84598704a316c7e78198d5f8e93565): @user_script:12: @user_script: 12: -OOM command not allowed when used memory > 'maxmemory'.

Is there a chance the fix will get backported to the 5.0.x line? Thanks.

Comment From: antirez

Hello @asgeirrr, I backported the fix to 5.0 (already in the branch), and will push a new release ASAP.

Comment From: guybe7

@antirez this is a reminder that before releasing another 5.0 we should tackle https://github.com/antirez/redis/issues/7105 - the minimum is to have streamPropagateGroupID use propagate instead of alsoPropagate

Comment From: antirez

@guybe7 thank you, you are right, I commented in the issue. Maybe we can fix it right away.

Comment From: kalenp

It looks like this was backported to the 5.0 branch, but never released, since it came in after 5.0.9. Is there a plan to make another 5.0 release with this fix in it?

Comment From: oranagra

in general we're waiting for some critical issue that will justify another release of 5.0, and in the meanwhile collecting a list of less critical issues to be backported when that happens. maybe this issue justifies a release on it's own, i'll try to gather a list of candidates and decide.

Comment From: kalenp

Thank you for the update.

Comment From: schmidp

we are also running into this problem and would be happy about a new release.

Comment From: guybe7

@schmidp IIUC the fix is in 5.0.10

Comment From: oranagra

Thanks @guybe7 . I meant to post here but forgot..