Redis Redis throttling to explicitly shed load

I didn't see a past ask for this, so starting a new thread as something to consider for Redis 8.

A common theme of asks from our customers at ElastiCache is they want to have explicit throttling when they send more requests than can be reasonably served by Redis. The most common reasons for this need is for what we refer to as "microbursts", which are very short durations of very high number of requests. These requests queue up, and cause significant latencies for all requests (since we only write out to clients at the end). Redis is uniquely bad at handling these types of requests compared to systems like memcached because of our single threaded scale out architecture.

The common ask is to have a new explicit error code, like -TOOMANY, that can be returned immediately once we know we have too many pending requests. We need to start sending this request ASAP, so that clients don't begin to retry the timing out requests. This request can be explicitly handled client side to execute the fallback code for a cache miss.

Although it's possible to implement this as a global configuration, far more likely this should be implemented as client option. Each client would configure their max latency tolerance, and once that tolerance has exceeded their commands would stop getting executed and instead just return an immediate error.

This isn't a great solution, since sending this information is also stealing away resources from executing other Redis commands. It's more useful to be executed IO threads however, since they don't need to steal any main thread resources to process this type of requests. It's even more useful for the number of Redis proxies that can more easily choose to throw this error when the number of requests to the backend is too high. We may consider just standardizing around an error message from the core, that proxies are encouraged to implement, and investigate implementing it over time.

@JohnSully for reference.

Comment From: oranagra

i think this is a good thing to explore (the back-off / -TOOMANY thing), but i have some related thoughts:

The common ask is to have a new explicit error code, like -TOOMANY, that can be returned immediately once we know we have too many pending requests. We need to start sending this request ASAP, so that clients don't begin to retry the timing out requests. This request can be explicitly handled client side to execute the fallback code for a cache miss.

if we wanna avoid letting clients wait too long for their replies while we process other client, we can also break out of the loops in networking.c, both processInputBuffer, and also the ones calling readQueryFromClient. for many cases this can reduce the latency from the client's perspective without any back-off complications (for the client libs, or app).

Although it's possible to implement this as a global configuration, far more likely this should be implemented as client option. Each client would configure their max latency tolerance, and once that tolerance has exceeded their commands would stop getting executed and instead just return an immediate error.

how will we measure latency? time since the event loop started? it could be that some requests are waiting in the socket since before the previous event loop ended, or it could be they arrived just before we started handling the current client (in the current event loop, while processing the previous client).

Comment From: yossigo

@madolson Did you consider the implication of returning a new type of error in practically every possible state? We'll need a detailed specification of handling it and how it interacts with client states. It may also introduce some backward compatibility challenges.

Besides that, I have some doubts about the scenarios where this mechanism can be effective. Dealing with pipelining clients, we may experience a "microburst" before they even read and process the reply, and by the time they do, it may be over and unnecessary to re-send the commands. Non-pipelining clients that implement a timeout have no choice but to reconnect, and we can apply throttling at that time.

Comment From: zuiderkwast

We do load shedding in the "ered" Erlang client. When the number of outstanding commands on the connection is too high (i.e. commands waiting for a reply from Redis), the client lib returns an error to the caller without sending the command to Redis. (We reuse connections so we have only one connection for each application node to each Redis Cluster node.)

Comment From: tezc

These requests queue up, and cause significant latencies for all requests (since we only write out to clients at the end)

epoll() has maxevents parameter. Looks like currently we set it to current fd count but we can lower it as well. So, we can handle fewer clients in an iteration. Maybe this is what we need here?

Comment From: madolson

Did you consider the implication of returning a new type of error in practically every possible state? We'll need a detailed specification of handling it and how it interacts with client states. It may also introduce some backward compatibility challenges.

Yeah, which is why I think clients need to opt-in. I sort of alluded to that, but didn't say it explicity. If we just start throwing errors, clients will likely propagate the errors up. Ideally, clients should be able to catch this and handle their fallback path for a cache miss. I also think this "too-many" error approach is pretty highly optimized for caching workloads, where you have a fallback path. Primary database workloads would want it as well, since the client would like an indicator to backoff that is more explicit.

We do load shedding in the "ered" Erlang client. When the number of outstanding commands on the connection is too high (i.e. commands waiting for a reply from Redis), the client lib returns an error to the caller without sending the command to Redis. (We reuse connections so we have only one connection for each application node to each Redis Cluster node.)

I generally also agree clients should support this.

epoll() has maxevents parameter. Looks like currently we set it to current fd count but we can lower it as well. So, we can handle fewer clients in an iteration. Maybe this is what we need here?

I'm a little worried this is more of a "heuristic", ideally we would want to be breaking up after some amount of time. A single client might produce 1 event but have 20 commands in it.

Comment From: zuiderkwast

Yeah, which is why I think clients need to opt-in. I sort of alluded to that, but didn't say it explicity.

I like the idea of opt-in features.

One for throttling (-TOOMANY), another one for async blocking commands (-ASYNC) #12716 and maybe others in the future. For clients, it's a problem if the server doesn't support a particular opt-in feature, when the client doesn't know the server version in advance. Perhaps we should consider some client-server feature negotiation in HELLO, which doesn't fail HELLO for unknown features but the client will see what they got in the HELLO response.

Comment From: madolson

Perhaps we should consider some client-server feature negotiation in HELLO, which doesn't fail HELLO for unknown features but the client will see what they got in the HELLO response.

I think this has been decided right? The plan is just have clients send all of the options they want to enable initially, and they get errors back when they are not supported. Doing a handshake requires an extra hop, which client developers didn't seem to like that much.

Comment From: zuiderkwast

We talked about it but didn't design how it should work.

Every time we add an option to HELLO, it becomes a syntax error in older Redis versions. But since a client can use a pipeline when initializing the connection, we don't really need it to be included in HELLO. The client can just send separate commands in pipeline and check if the result was -ERR Unknown command or +OK. (CLIENT TRACKING is another such feature.)

Btw, you say "be returned immediately once we know we have too many pending requests" but later it's only about latency, since we only parse one command at a time. If it's just latency, maybe we should consider another error message, like -TIMEOUT or -TOOSLOW?

Comment From: madolson

Every time we add an option to HELLO, it becomes a syntax error in older Redis versions. But since a client can use a pipeline when initializing the connection, we don't really need it to be included in HELLO. The client can just send separate commands in pipeline and check if the result was -ERR Unknown command or +OK. (CLIENT TRACKING is another such feature.)

Yeah, that is what we document here: https://redis.io/commands/client-setinfo/.

Client libraries are expected to pipeline this command after authentication on all connections and ignore failures since they could be connected to an older version that doesn't support them.

Btw, you say "be returned immediately once we know we have too many pending requests" but later it's only about latency, since we only parse one command at a time. If it's just latency, maybe we should consider another error message, like -TIMEOUT or -TOOSLOW?

I was originally basing it off the 429 error message, Too many requests to indicate some generic "you're sending too many commands", which could be because of a bunch of different reasons. TIMEOUT makes some amount of sense as well, it just seemed a bit less generic. TOOSLOW seems to imply the client is being too slow, and not the server.

Comment From: asafpamzn

I like this suggestion, this can be also be useful when the server is overloaded since it is executing some background operation like slot migration or full sync. In such scenario the client should throttle and reduce the load for a short period of time to let the server to finish the operation.

Comment From: madolson

I like this suggestion, this can be also be useful when the server is overloaded since it is executing some background operation like slot migration or full sync. In such scenario the client should throttle and reduce the load for a short period of time to let the server to finish the operation.

The Full sync case is a really good point, since it would be a way to limit copy on write for edge cases. Instead of forcing the disconnect because of CoB limits, we could throw the explicit error.