Edit: see current conclusion in https://github.com/redis/redis/issues/9926#issuecomment-992833835
Describe the bug
I've created a scenario where used_memory is bigger than maxmemory
and where maxmemory-policy is set to noeviction. When I use a command
wrapped in MULTI, it behaves differently than when I use it standalone.
To reproduce
config set maxmemory-policy noevictionconfig set maxmemory 12026784- write keys until
OOM command not allowed when used memory > 'maxmemory' - ignore that error and brute force to write even more keys (some go through) to make sure there's really memory > maxmemory
- try to delete a key wrapped in multi:
> MULTI
OK
> DEL foobar
(error) OOM command not allowed when used memory > 'maxmemory'.
> EXEC
(error) EXECABORT Transaction discarded because of previous errors.
Expected behavior
I would expect any non-write command to work in MULTI. It only works without MULTI:
> DEL foobar
(integer) 1
Same behavior with INFO. There are probably more commands where it would make sense to allow them in MULTI in that scenario (In my case CLIENT PAUSE was part of it, but given that it buffers requests, I can see why it wouldn't be allowed).
Additional information
Tested in Redis 5.0.14
Comment From: itamarhaber
A quicker way to reproduce (also on unstable): set maxmemory 1mb :)
Comment From: markuszoeller
A quicker way to reproduce (also on unstable):
set maxmemory 1mb:)
haha, yeah, I just used the value I had in my test :)
Comment From: madolson
I believe this is intended behavior because multi-execs consume additional memory when building up the object, so we don't want people building up objects when you're at maxmemory.
I don't know if I agree with this behavior, it's really no different than accumulating commands in the query buffer or loading up commands into a lua script. I would prefer we just make sure there are no "use-memory" commands inside the multi-exec.
Comment From: oranagra
Indeed.. This behavior was added on purpose in redis 5.0, see #5454
Comment From: markuszoeller
Indeed.. This behavior was added on purpose in redis 5.0, see #5454
Does it make sense in it's current form though?
I was also under the impression that maxmemory is only limiting the keyspace and not memory the Redis process would need to respond to something, like multi-execs. Is that assumption not correct?
Comment From: oranagra
nothing major was changed since then in that respect, so this change makes as much sense now as it did then.
redis counts memory usage of (almost) everything, including client buffers and other non-keyspace allocations. the maxmemory limit however, is enforced by evicting keys, or rejecting commands. redis does avoid rejecting some commands, namely read command and delete commands, or actually ones that it thinks are gonna increase the memory usage.
i think the reason MULTI queuing is rejected since the client could otherwise abuse redis but sending more and more commands to be queued, and never send EXEC, which will result in an unbound memory usage.
maybe it makes sense to give the queuing some grace, and not reject them too soon, so that we don't interrupt naive workloads by a temporary memory overuse. however, if the queued commands contain any "use-memory" commands, then EXEC will fail when it arrives, so it might not help much (other than for read / delete workloads)
Comment From: markuszoeller
redis counts memory usage of (almost) everything, including client buffers and other non-keyspace allocations. the maxmemory limit however, is enforced by evicting keys, or rejecting commands.
I understand how it's enforced, but it's not 100% clear to me what actions exactly increase/decrease the memory count which is limited by maxmemory. I was trying to say, that I though only actions like set foo bar (keyspace actions) increase that portion of memory usage which is limited by maxmemory, that's why I was surprised to see "non-keyspace actions" like INFO to be limited by multi-exec.
(other than for read / delete workloads)
That would already help I think. Not being able to call INFO or DEL in multi-exec feels a bit too restrictive.
Comment From: eladbern
Now that we have client eviction that accounts for multi memory (via mstate.argv_len_sums), should we just allow non-'denyoom' transactions?
That would allow performing reads, deletes, etc. inside transactions.
Comment From: oranagra
client eviction is off by default, so i don't think it changes the picture here. i still think we should just allow some grace, i.e. smaller multi-execs should be allowed in OOM state (unless they contain DENYOOM commands). i suppose that instead of just allowing them till a threshold of 128 commands or alike, we can use argv_len_sums (e.g. some constant of 32kb?)
anyone interested in coding this and it's test?
Comment From: ranshid
(e.g. some constant of 32kb?)
@oranagra why constant size and not something configurable? I would suspect that in some cases using large machines and specific workloads with very large key sizes it might make sense to use different values? I think having a config with default of 32kb is also fine.
anyone interested in coding this and it's test?
I think we can take it. I understand this will be for 8.0 release? in case we add a config which defaults to keep existing behavior, can we consider minor release of this change?
Comment From: madolson
I think having a config with default of 32kb is also fine.
I think a configuration is sensible here, but I was going to advocate we just use the proto_max_bulk_len configuration. Since we've used it elsewhere as the max transient memory limit (for stralgo) if there are no deny-oom commands. If we introduce a new configuration, I would rather it be generic to all the "transient" memory that can be used by the system as a buffer.
I think we can take it. I understand this will be for 8.0 release?
Yeah, it would be for Redis 8.
Comment From: oranagra
as much as i don't like hard coded constants, i'm not sure this case calls for a config. commands with large arguments are usually write commands (e.g. SETting a new value for a key), and here we aim to allow GET and DEL commands, which only carry a key-name, not a value.
we're aiming to solve two things: 1. if the OOM state is temporary (very short living), there's no reason to fail queuing of short multi-exec sequences (e.g. 2 to 10 commands). (unlike someone who's trying to flood the memory by queuing 10,000 commands, which is the case for which this limit was added). 2. even if the OOM state is not temporary, we should allow read-only and deletions to be executed in a multi-exec (as long as the queued sequence doesn't get huge).
as such, i don't see a reason for a massive memory limit for that, even using proto_max_bulk_len (default to 512mb) seems far too high to me. a hard coded small value seems sufficient. or alternatively if can be a factor of the used memory (e.g. used_memory / 1000).
Comment From: ranshid
O.K. the reason I suggested the config was in order to help users avoid inconsistent states were they sometimes fail to execute their non-deny-oom requests inside transactions and sometimes don't. I agree this is probably edge case were this will happen but a config can also help users to completely allow non-deny-oom commands or manage some safe limit on transactions growth in case they fear a risk.
Anyway I am fine with hard coded limit, we can change it in the future if we think it is needed.
Comment From: madolson
as such, i don't see a reason for a massive memory limit for that, even using proto_max_bulk_len (default to 512mb) seems far too high to me. a hard coded small value seems sufficient. or alternatively if can be a factor of the used memory (e.g. used_memory / 1000).
The use case that came to me is just doing a lot of small deletes, which I have seen in the 100s/1000s be done before. I think that still practically means a limit of like 1MB is probably high enough. (32K seems small) I would prefer we have most constants be tunable, and I'm not sure I really see the problem with having a high limit here, since it needs to be an authenticated connection (and malicious actors can already consume up to the proto_max_bulk_len).
Comment From: oranagra
i don't like hard code values, but i also don't like confusing users with too many low level configs. in this case i think it is for a very specific corner case, and i don't think it should have a config.
yes, users can consume to to max_bulk_len per argument, in each argument of each command, and they can always just avoid filling the last argument, so that argv will never be complete and the command won't be executed. so maybe this limit is more to protect against bugs and innocent abuse rather than an attack.
@soloestoy maybe you can provide the reason behind https://github.com/redis/redis/pull/5454?
see short summary of the problems we want to solve in https://github.com/redis/redis/issues/9926#issuecomment-1869342111
Comment From: soloestoy
redis counts memory usage of (almost) everything, including client buffers and other non-keyspace allocations. the maxmemory limit however, is enforced by evicting keys, or rejecting commands. redis does avoid rejecting some commands, namely read command and delete commands, or actually ones that it thinks are gonna increase the memory usage.
i think the reason MULTI queuing is rejected since the client could otherwise abuse redis but sending more and more commands to be queued, and never send EXEC, which will result in an unbound memory usage.
@oranagra 's response has already provided a very good explanation of my purpose in #5454 .
Transactions are special that they are not executed immediately; instead, the commands are cached. Therefore, regardless of whether the commands themselves consume memory during execution, they consume memory while being cached. From this perspective, the cached commands such as get/del/set in a transaction are the same in an OOM scenario. Perhaps I am being inflexible in this respect, but I don't think we need to make any changes. I don't see any difference in an OOM scenario between multi/del key and multi/keys * or multi/set key value.
Here, I also want to explain the reason for the restriction I made at that time in #5454. Before this, we had users who encountered situations where Redis consumed a lot of memory exceeding maxmemory (which could potentially cause machine OOM). After investigation, we found that this was due to users using transactions extensively, and each transaction had many cached commands. Additionally, there were also many connections. As a result, the cached commands in transactions consumed a lot of memory (and since the commands in the transactions were all read-only, no OOM error was triggered). In this scenario, multi/del key is no different from multi/get key; they both consume a large amount of memory.
I don't know the original intention of multi/del key/exec in this issue. If it is to delete data when OOM is encountered, then there is no need to use transactions. If it is a normal requirement in the business, then, based on my understanding, there must be other transactional commands like multi/set key value/exec and multi/get key/exec, which will still encounter OOM errors. Therefore, there is no need to specifically exempt multi/del/exec. At this point, what the user should truly do is to scale up or clean up invalid data.
Comment From: eladbern
I agree that under OOM, users should not be allowed to perform huge multi sequences and ideally nor should they be allowed to accumulate large single commands. It does make sense to couple these and have some configurable limit of total used mem per client when at OOM (that should be much less than proto_max_bulk_len by default).
Comment From: oranagra
i'd like to remind us that in many cases (past and present) we refuse to add variants for commands (like creation of GETDEL, a variadic HINCRBY, an HGET+HDEL, or any other wild combinations) while arguing that the user can achieve that with a lua-script or multi-exec. so i do like to make sure it's possible to use these short scripts in any place were such a new command variant would be possible. (for scripts we recently added the shebang flags that should hopefully take care of that). so if a user can do GETDEL in OOM state, i'd like them to also be able do the similar MULTI based one.
@soloestoy as you mentioned, if the user is abusing redis, they can use the KEYS command with or without MULTI. and they can consume memory by simply queuing a non-MULTI command without completing it (sending a *1000\r\n, and then providing just 999 large arguments). so there's no reason to block small transactions, we only need to block large ones.
Comment From: madolson
Here, I also want to explain the reason for the restriction I made at that time in https://github.com/redis/redis/pull/5454. Before this, we had users who encountered situations where Redis consumed a lot of memory exceeding maxmemory (which could potentially cause machine OOM). After investigation, we found that this was due to users using transactions extensively, and each transaction had many cached commands.
How did this user respond to blocking their multi-execs? It seems like they would be broken by this change and need to make a change. It seems like a better solution for that user was better memory management to protect against OOM.
Comment From: eladbern
@madolson @oranagra Let's see if we can move forward with this.
It seems from the discussion that the high level requirement is to control the buffer memory clients can use while server is OOM. This is not covered well by maxmemoy-clients since it doesn't distinguish between steady state and OOM state.
Suggested options:
1. Deal only with multi with hard-coded limit.
2. Deal only with multi with a configured value.
3. Deal with any client memory by having a maxmemory-clients-during-oom config.
IMO (3) provides the most value since it will allow more use cases. Thoughts?
Comment From: oranagra
i think the complexity of 3 isn't justified for what it comes to solve, and i'd rather not add unnecessary configs. these are hard to understand, and the chances of anyone ever setting them is low, instead redis should just do the right thing by default.
@eladbern i suggest re-purposing your existing PR to handle a hard coded limit of 32kb, and i'll try to get it approved and merged. i'd argue that compared to what we have now, it doesn't expose any OOM abuse risks, and it does solve the problem for most / all innocent clients.
Comment From: madolson
i think the complexity of 3 isn't justified for what it comes to solve, and i'd rather not add unnecessary configs. these are hard to understand, and the chances of anyone ever setting them is low, instead redis should just do the right thing by default.
The default value of the config would be the right value by default though? Having obscure configs that somebody might someday tune doesn't seem like a big issue as long as most people don't touch them. I think we should be more okay embracing obscure configs that most people won't ever touch, because redis behaves correctly by default, but maybe somebody might. Either that or we have to come up with a more comprehensive memory management story.
I also still think 32kb is too low, at the very least we should get consensus on that before making a further change.
Comment From: oranagra
i'll raise it next core-team meeting. but i have a feeling it'll be hard to agree on something, and it worries me since the current code is somewhat broken.
Comment From: soloestoy
so if a user can do GETDEL in OOM state, i'd like them to also be able do the similar MULTI based one.
I believe the root of the problem lies not in the size of the transaction, but in whether to allow transaction caching during OOM. If executing small transactions is allowed, then in my opinion, executing large transactions should also be allowed. It is different from the case where unauthenticated clients trigger a DOS attack with large parameters. If we think that OOM should focus on command attributes (whether they are of CMD_DENYOOM type) rather than transaction caching, then I think we can directly revert #5454, I do not object to doing so. As I think it is unnecessary to limit the size of transactions during OOM either through configuration options or hard code.
Comment From: soloestoy
Moreover, I believe the root cause is the lack of memory partitioning. The maxmemory limit restricts all the memory used by Redis, but the action to reduce memory is by deleting data, which is unfair.
If we want to fundamentally address this issue, the theoretically correct approach would be to separate the memory into the data area (maxmemory-dataset) and the runtime area (maxmemory-runtime). Only when the data exceeds maxmemory-dataset should we reject CMD_DENYOOM commands and delete data. In other cases, when the runtime memory exceeds maxmemory-runtime, we should not delete data or reject CMD_DENYOOM commands.
We have already taken a step in this direction by introducing maxmemory-clients. Perhaps in 8.0, we should enable maxmemory-clients by default, for example, setting the default configuration to 100%. In this way, even if we revert #5454 and do not limit transaction execution during OOM, maxmemory-clients will help disconnect connections with excessive transaction cached commands.
Comment From: oranagra
i'm ok reverting it, and i'm ok setting a hard coded sensible limit. what do you mean by setting maxmemory-clients to 100? 100 bytes? 100% of maxmeomry?
Comment From: soloestoy
what do you mean by setting maxmemory-clients to 100? 100 bytes? 100% of maxmeomry?
100% of maxmemory... 😅
Comment From: madolson
In this way, even if we revert https://github.com/redis/redis/pull/5454 and do not limit transaction execution during OOM, maxmemory-clients will help disconnect connections with excessive transaction cached commands.
The only thing that I don't like about this behavior is that you'll get a "disconnect" as opposed to an explicit error. Disconnects can be hard to root cause.
Comment From: soloestoy
Disconnects can be hard to root cause.
Good point. Disconnection typically occurs in three situations:
- Network broken
- Client-initiated disconnection
- Server-initiated disconnection
Here, we are particularly concerned with the third case, which is server-initiated disconnection. This can occur when a client times out, exceeds buffer limits, or encounters other issues. Currently, there is no notification to the client about the reason for the disconnection.
However, there is an interesting exception when a client establishes a new connection. If the number of connections exceeds the maxclients limit, Redis will first reply to the client with "-ERR max number of clients reached" before disconnecting. Maybe we can extend this mechanism (sending the reason to the client before disconnecting) to other scenarios of disconnection.
Comment From: madolson
However, there is an interesting exception when a client establishes a new connection. If the number of connections exceeds the maxclients limit, Redis will first reply to the client with "-ERR max number of clients reached" before disconnecting. Maybe we can extend this mechanism (sending the reason to the client before disconnecting) to other scenarios of disconnection.
We do this for ACL forced disconnection (if you delete your own user for example, you still get the OK before getting the disconnect). This normally causes a slight latency bump, as it's not till the next time the client is pulled out that it's checked and sees it is disconnected.
Comment From: soloestoy
We do this for ACL forced disconnection (if you delete your own user for example, you still get the OK before getting the disconnect). This normally causes a slight latency bump, as it's not till the next time the client is pulled out that it's checked and sees it is disconnected.
Yes, I know, and we have CLIENT_CLOSE_AFTER_REPLY mechanism in place for scenarios such as the QUIT command, CLIENT KILL command, and disconnecting blocked clients when the master becomes replica. However, these mechanisms still rely on satisfying the ping-pong protocol. But in cases like disconnection due to query or output buffer limit, it can lead to exceptional handling of client protocols. I'm wondering if we can have a more general approach to handle such scenarios, perhaps utilizing the PUSH message in RESP3.
Comment From: oranagra
This topic was discussed in a core-team meeting, and our conclusion was that: 1. we wanna disconnect the tie of MULTI state memory usage and the server OOM state. meaning that we'll limit the total memory a single client can queue, and do that unconditionally regardless of the server being OOM or not. 2. we don't wanna add a new config, and we can use the query buffer limit for that. the reasoning is that it's ok to send a single command with huge argument, and it's ok to send a long MULTI queue with many small commands. but it's probably not a practical use case to send a long MULTI queue with many commands carrying a huge argument. 3. when the rules are violated, we think it's ok to disconnect the client (rather than return an OOM error), same as we do on query buffer size exhaustion.
@soloestoy i suppose your PR is closer to that end goal? either way, let's make sure the above reasoning is reflected in the commit comment.