Present model

The publisher can publish to any node in a cluster, and the receiving node will broadcast the message to other nodes. For some reason, there is a delay of 30 to 40 milliseconds when broadcasting, which results in ~20x slower performance. However, I could not debug the reason as I am not a C developer (I would also like to understand why the delay).

Expected

The delay must be reduced for better performance

I have built a lightweight RPC server using REDIS, and because of this issue, when in random node subscription case, I process 63 req/sec, and during targeted subscription can process 1250 req/sec

Comment From: hpatro

Hi @giridharkannan,

I've few questions.

  • How are you measuring the delay/lag?
  • Are there any other workload also ongoing in the cluster (data caching) ?
  • Are you seeing the delay in broadcasting while publishing on a particular node or on all nodes?
  • Which Redis version are you using ? We've made certain improvement in cluster bus pub/sub message propagation in Redis 7.

Comment From: giridharkannan

Hi @hpatro ,

  • How are you measuring the delay/lag? : I am using Wireshark for debugging (if needed can share the packets)
  • Any other work load : No, and the result is consistent
  • Are you seeing the delay in broadcasting while publishing on a particular node or on all nodes?: Publishing starts after 30 ms, so on all nodes
  • Redis Version: 7.0.8

Comment From: hpatro

@giridharkannan Do you mean the message is sent after 30ms from the source node? And there is still additional time for message to be sent over network to the target node and published to the subscriber(s).

Also, What's the payload size? Did you try connecting subscriber on the same node where publish is done and what was the behavior?

Comment From: giridharkannan

@hpatro Source node (Redis node that received the Publish message) notifies other nodes in the cluster after 30 ms. Communication to client subscribers from Redis are always fast.

The payload size is < 100 bytes.

When connecting subscribers on the same node, it is fast (20x faster)

Comment From: zuiderkwast

Maybe you want to try the new sharded pubsub SSUBSCRIBE, SPUBLISH? Shard channels are sharded just like keys. Each channel is owned by one cluster node and clients get MOVED redirected to the right node.

Comment From: hpatro

@giridharkannan I tried Wireshark over the weekend and the observation of the network behavior are in line with the expectation.

  1. No subscribers

publisher -> server (6380) [publish ch1 hello]                   1677910337.119609000
server (6380) -> server (17379)                                  1677910337.120162000
  1. Subscribe on node (6380) and node (7379)
publisher -> server (6380) [publish ch1 hello]           1677912229.360370000
server -> subscriber (6380)                              1677912229.360599000
server (6380) -> server (17379)              1677912229.360715000
server (7379) -> subscriber (7379)           1677912229.360918000

Roughly 600 microseconds to transmit from publisher client -> source node -> target node -> subscriber client on target node. Not sure what's different on your end. Could you try setting up some dev environment with smaller set of nodes and try out.