Redis short WAIT timeout isn't respected

@antirez we have some user use cases who are trying to issue WAIT with timeout of some 50ms. However the fact is that what's releasing clients is clientsCron, which apart from being dependent on hz, also actually tries to process each client approximately once per second.

I would like to check what you think about splitting the loop in clientCron into two, one that handles timeout for blocked clients, and the other that handles the rest of the (less urgent) house cleaning.

this way, if there are not many blocked clients, they'll be released very quickly (hopefully within one cron tick).

for WAIT clients this is easy since we already have a dedicated list of these clients, but considering that redis 6.0 adds fractional timeout for BLPOP and others, maybe we want to create a list that includes all blocked clients, and serve them too.

what do you think?

Comment From: antirez

Ok no sorry for the previous comment, I see here the thing is different compared to the other issue I mentioned.

Comment From: antirez

To reply to your question @oranagra, I'm not against it, but there are a few things to check:

In order to guarantee each client is served with an HZ maximum time, if we have 10k clients, how many milliseconds of latency will this add in low-end VMs?
Maybe it's actually a good idea to have a separated list of clients in blocking operations so that we can handle a much shorter list, as you said? (I don't remember why the WAITing clients have their own list btw).
But anyway, given that we may have 10k clients all blocking... so:

Maybe the right solution is to have blocking clients in a separated radix tree indexed by the time they'll expire. We can then refactor the current code to make it more robust, and the radix tree may be even handled in a very easy way without any expensive cleanup. So this would work like that:

Every time a new client blocks with a timeout, we set it in the "ClientsWithTimeout" radix tree (we'll call it differently I guess), where we use as a key the time it will unblock, plus the client ID, which is unique, so the key is always identified uniquely.
Every HZ loop, we run the radix tree until the current time. We have to resolve the client ID with a lookup, and check if the client needs to do any unblocking.

So basically "2" will call the same function that will keep calling in the current code (after the refactoring) when we run the list of clients. Basically this would be just an additional mechanism that we can use to react much faster to situations like that. As a further refinement, we could add to such radix tree only the clients having timeouts that are smaller than 2 seconds, for instance. This will amount to a very small amount of code that will not require any cleanup in freeClient().

Comment From: antirez

P.S. What do you think? It was my initial idea.

Comment From: oranagra

@antirez i didn't intend to guarantee to unblock each client at the right time, certainly not on the expense of latency. i just wanted to suggest to iterate on blocked clients separately than the rest (using the same mechanism), so that if there are not many, they'll get unblocked sooner.

however, your suggestion is obviously far better, assuming we're willing to accept the extra code complexity and maybe memory utilization.

the only thing i have to add is that if we have them sorted, we can do that processing in beforeSleep rather than clientCron, and that would probably produce quite accurate time (assuming there's traffic).

Comment From: antirez

Implementation of my proposal above:

"precise-timeout" branch only implements this for very short timeouts.
"precise-timeout-2" branch applies it as the only method (my preferred of the two)

Checking for the improvement:

Test the same with the normal and patched version:

In one terminal create idle connections with: ./redis-benchmark -c 200 -I (even 200 are enough)
In another terminal run WAIT to check for latency with redis-cli:

Before the change:

127.0.0.1:6379> wait 1 550
(integer) 0
(1.24s)
127.0.0.1:6379> wait 1 550
(integer) 0
(0.94s)
127.0.0.1:6379> wait 1 550
(integer) 0
(1.10s)
127.0.0.1:6379> wait 1 550
(integer) 0
(1.11s)

After the change

127.0.0.1:6379> wait 1 550
(integer) 0
(0.56s)
127.0.0.1:6379> wait 1 550
(integer) 0
(0.62s)
127.0.0.1:6379> wait 1 550
(integer) 0
(0.58s)
127.0.0.1:6379> wait 1 550
(integer) 0
(0.62s)

Comment From: oranagra

@antirez sorry for the delay, was very busy.. it looks great, and works great (i also prefer precise-timeout-2, no hard coded thresholds, and two code paths)

my only concern is about cleanup. both for disconnected clients, and also for ones that got unblocked naturally (by an LPUSH or REPLCONF ACK).

if the usage pattern is that many clients use long timeout (e.g. hours), but get unblocked or disconnected quickly, we may end up with a lot of dead records in the rax.

unless i'm missing something, it is quite easy to check the flag and timeout when unblocking or disconnecting a client, re-compute the rax key and delete the record. i.e. the timeout, blocked flag and id are supposed to be unchanged at that point, right?

Comment From: antirez

@oranagra LOL are you kidding Oran? I make you wait months sometimes. Just reply when you want :-D It is extremely simple to cleanup the radix tree, and I agree with you that once we handle everything with it, it's a good idea. I'll do it tomorrow and merge after some stress testing. Thanks for the feedback :-)

Comment From: antirez

I updated the code adding the cleanup part. Now I'm going to try to avoid the lookup in yet another branch, given that now we handle this carefully and have full control, so probably we can avoid the client ID lookup at all.

Comment From: oranagra

code was merged to unstable and 6.0 RC3. closing. thanks.