Redis [NEW] Support pubsub subscribe to a Stream with consumer groups

Whether stream consumption supports push mode with a long connection in the future, while the remote consumption with pull mode have a poor performance with twice network delay.

Comment From: sundb

Push reduces latency, but causes more problems. 1. When there are multiple consumers, how do we identify which consumer we should push the message to? 2. If a consumer is busy, it is not reasonable to actively push to that consumer.

Comment From: chendali16

Consumers who subscribe to the queue will push, and only push real-time data.This means that the redis consumes data by itself and push data to consumers in multiple threads. The consumers allow to lose data when it is busy and it can compensate for the lose data by the xread or other.

Comment From: sundb

It feels like you are missing something, XGROUPREADis mutually exclusive, if message A is consumed by consumer c1, c2 will not receive message A. If we use push, we would need to maintain the status of all the consumers, which ones are busy and which ones are idle, and without that information we would have to pick a random consumer to push, which would be very complicated anyway. On the other hand, if we push a message to a very busy consumer, but all the other consumers may be idle at the time, it will not be worth the cost and will cause higher latency.

Comment From: chendali16

It feels like you are missing something, XGROUPREADis mutually exclusive, if message A is consumed by consumer c1, c2 will not receive message A. If we use push, we would need to maintain the status of all the consumers, which ones are busy and which ones are idle, and without that information we would have to pick a random consumer to push, which would be very complicated anyway. On the other hand, if we push a message to a very busy consumer, but all the other consumers may be idle at the time, it will not be worth the cost and will cause higher latency.

Instead of xreadgroup, There can be a new method to allows one consumer to consume shared data. If multiple consumers consume shared data, each consumer consumes shared data of certain rules, for example, consumer1 consume the data with the specific identity which hash value is 1, consumer2 consumer data with the specific identity which hash value is 2.

Another problem is that multiple threads push data to consumers. Each thread sets a watermark. If the consumer is busy, only one thread will lose data.

Comment From: sundb

Normally, the network latency would be much less than the time consumer consume message, so why do you need such a low latency?

Comment From: chendali16

Normally, the network latency would be much less than the time consumer consume message, so why do you need such a low latency?

Our services are deployed on the cloud. One node writes to the redis stream, and multiple nodes consume the redis stream. Normally,it takes only 1ms for consumption.It is assumed that cross-border network transmission requires 100ms and cross city network transmission requires more than 15ms. When we using pull mode to consume redis stream, the delay is 200ms or 30ms, which is unacceptable for systems with high real-time requirements.

Comment From: madolson

@chendali16 Do you effectively want to have a way to subscribe, like pubsub subscribe, to a stream? I feel like this is a feature that has been asked for in the past. I think this is a really good feature and worth implementing.

Comment From: chendali16

@chendali16 Do you effectively want to have a way to subscribe, like pubsub subscribe, to a stream? I feel like this is a feature that has been asked for in the past. I think this is a really good feature and worth implementing.

Yeah,that is what I want.

Comment From: zuiderkwast

Since streams are keys which are sharded in Redis Cluster, perhaps we should use sharded subscribe? And a special channel prefix to disambiguate from regular channels? Idea: SSUBSCRIBE __redis_stream__:mystream

Comment From: madolson

I think we should add a new command, because I think it would be good "resume" consuming from a stream with the same command as the caller.

XSUBSCRIBE STREAMS <stream_1> <id> <stream_2> <id_2>
XSUBSCRIBEGROUP STREAMS <stream_1> <id> <stream_2> <id_2> <whatever group arguments are>

To resume consuming from the ids they last saw on disconnect. It handles the "failure" mode better.

Comment From: hpatro

Interesting, I've seen reverse request as well to persist PUBSUB data to stream. We need to build some form of interoperability between these two.

Consumers who subscribe to the queue will push, and only push real-time data.This means that the redis consumes data by itself and push data to consumers in multiple threads. The consumers allow to lose data when it is busy and it can compensate for the lose data by the xread or other.

@chendali16 For the above, How do you intend to discover if there is a loss of data from the consumed data in pubsub mode?

Comment From: chendali16

Interesting, I've seen reverse request as well to persist PUBSUB data to stream. We need to build some form of interoperability between these two.

Consumers who subscribe to the queue will push, and only push real-time data.This means that the redis consumes data by itself and push data to consumers in multiple threads. The consumers allow to lose data when it is busy and it can compensate for the lose data by the xread or other.

@chendali16 For the above, How do you intend to discover if there is a loss of data from the consumed data in pubsub mode?

The data contains the increment sequence number.

Comment From: hpatro

I was thinking more on this.

@chendali16 Doesn't the BLOCK option in XREAD/XREADGROUP command solve this problem for you ?

Comment From: hpatro

@itamarhaber What are your thoughts on this?

Comment From: chendali16

I was thinking more on this.

@chendali16 Doesn't the BLOCK option in XREAD/XREADGROUP command solve this problem for you ?

the BLOCK option still cannot solve the delay problem

Comment From: hpatro

@chendali16 If you put up a very large timeout period in the BLOCK option, wouldn't the behavior be similar to pub/sub and there won't be any delay in receiving the messages.

Comment From: hpatro

Realized that the client gets unblocked after single message is added to the streams. Doc needs to be updated https://github.com/redis/redis-doc/issues/1450

So, the the latency/delay issue still exists.

Comment From: joshgarnett

Hi All,

It looks like this has stalled, so I wanted to give another use case for the feature. I'm currently working on an app that is leveraging Redis for passing messages between clients (think chat system). Utilizing the sharded pubsub commands, I've been able to support over 500K clients connected to a cluster of Go apps, exchanging messages on 200K+ active Redis pubsub channels, with a client notification rate of close to 2M/s. I've been impressed with how far I've been able to push Redis and how low the latency is between clients.

There are some use cases though where having history of the messages would be super useful. Streams seems like a really good fit for this, but unfortunately quickly gets bottlenecked on XREAD calls. To reduce the number of active connections that need to be made, I've compressed the keyspace down to 200 shard keys (same pattern used for sharded pubsub). A go routine is created for each shard key, which then polls for new messages with XREAD. At scale, each XREAD call can be requesting messages for hundreds if not thousands of streams. The engine cpu utilization quickly becomes maxed out on each node in the cluster, resulting in much higher p95s for end to end latency and greatly reducing throughput.

Adding a subscribe pattern for streams should address this bottleneck as the redis server would avoid having to constantly recheck the same streams by polling commands.

Let me know if I can help clarify the use case further and thanks for everyones hard work on the project.