Redis Redis doesn't seem to interleave Lua XADD calls made in a loop with XREAD or XREADGROUP that is running in parallel

Hi All,

I have a dedupe script that looks like this and Runs in Process A

func GetDedupeScript() string {
        return `if redis.call("EXISTS", ARGV[1]) == 0 then
            redis.call("SETEX", ARGV[1], ARGV[2], "1");

            redis.call("XADD", ARGV[3], "MAXLEN", "~", ARGV[4], "*",
            "jsonRpcMethodName", ARGV[5],
                        "value", ARGV[6]
        )
        end`
}

I invoke this script in a loop like below.

for i := 0;  i<len(messages); i++ { // len(messages) ~ 500
    go func() {
         redisClient.Run(GetDedupeScript(), messages[i])
     }()
 }

I have another process called it Process B that runs XREAD with COUNT 200 AND BLOCK 5000 from the same topic Process A is sending messages to.

when I do redis-cli monitor I see the following pattern

``` EXITS SETEX XADD

EXITS SETEX XADD

As you can see XREAD is getting called after the entire loop finishes running although both XADD and XREAD are running in a separate processes but on the same machine. This pattern resulted in an increased LATENCY like 100-200 ms. Any ideas why this is happening?

I was thinking what happens if I introduce sleep after every iteration so I did this

for i := 0;  i<len(messages); i++ { // len(messages) ~ 500
    go func() {
         redisClient.Run(GetDedupeScript(), messages[i])
     }()
    time.Sleep(2 * time.Millisecond)
 }

This resulted in the following pattern

EXITS
SETEX
XADD
XREAD 

EXITS
SETEX
XADD
XREAD 

EXITS
SETEX
XADD
XREAD 

...500 times

Now I see the calls are getting interleaved nicely and results in Latency of 1ms which is awesome. I am trying to see how I can achieve the same result without introducing time.Sleep? I tried pipe-lining but that did not work it resulted in the same pattern as the one prior to introduction of sleep (like XADD 500 times and then one XREAD).

Comment From: yoav-steinberg

First I'd like to say that this isn't really the place to post this question. There are redis users forums out there. We'd like to see bug reports and feature requests here.

Having said that, this is what I'm guessing is happening: You're using a go routine, so there are actually 500 simultaneous calls to GetDedupeScript(). In the background your redis client (go-redis?) is using a connection pool or even a single connection. So in practice 500 commands are being pushed onto very few connections to the redis server. This creates a pipeline of lots of calls one after the other to GetDedupeScript() on the connections to the server. What redis does when it finds lots of commands on a single connection is to process them one after the other until there are no more commands on that connection. Only then it'll look at other connections to see if there are other commands to be handled. At this point after processing a big pipeline of GetDedupeScript() from process A, it'll look at the connection process B has and start handling the XREADs.

This is done for the sake of performance. If you try the same thing with, say, 50,000 instead of 500 you'll probably see some interleaving XREADs between the GetDedupeScript(). If you want to improve latency you might try one of the following: * Don't use a go routine, call GetDedupeScript synchronously from process A. * configure your client to use more connections and distribute the commands evenly between connections instead of pipelining them on a single connection.

Comment From: kant777

@yoav-steinberg Hi, thanks for the response. I had all those options but it did not help. it is showing me the same exact pattern.

If i drop the go routine, it is lost worse, latency goes to 200ms.
GetDedupeScript() is a simple function that returns a lua script in a string. redisClient.Run is the function that sends lua and args to redis to run and yes it will be called 500 times or whatever the size of that list
By default the pool has 10 connections per cpu however I am not 100% sure if 10 connections are getting are used are only one is getting used? I tried doing lsof -i:port and I see 10 established sockets so I tried raising to 1000 connections in the pool and ran the experiment and did lsof -i:port I see 800 established connections both on the client machine and on the redis machine (probably because the list is of size 800 at that instant)

Finally, with all the experiments on the client process i am still not seeing XREAD and XADD getting multiplexed. It seems to follow the same patter that is all xadd's first and then one XREAD causing 200 ms latency which is not acceptable for what I do.

Comment From: yoav-steinberg

Again, what you'll need is to make the call to the LUA script sent simultaneously on multiple clients instead of pipelined on a single (or very few) connection. I'm no expert on the go redis client so I don't know how it'll distribute the calls between the connections in its pool. What you can hack around this by creating lots of redisClients and use them in some round-robin way instead of sending all the commands on the same client. Please try moving this discussion to the redis-go client forum.

Comment From: kant777

@yoav-steinberg Tried creating 5000 clients and distributed them using the mod operation. Did not help, latency is still 70-100ms. I am very much inclined to think it is the problem in the redis server than the client after all the experiments. we have exactly a similar app in node.js as well and the experience had been the same so it is highly likely the problem is on the server side.