I wish the new data structure Stream (designed as Message Queue) can offload part of the data to disk, because when millions of messages need to be guaranteed not to be lost, need to be guaranteed to be consumed, need to support expansion, need to allow stacking, Redis can't offer a perfect solution, while Kafka does.
If messages comes too fast in a stream until the memory is full, the redis will be very slow. It lead other services such as Cache system crashes. Can limit the stream maximum memory size, and over the size, messages will store in the disk.
How to develop a method to implement this strategy: * Not load all the data of Stream to memory, just limit the max memory size to load lastest data when Redis starts. * When over the maximun size, offload the oldest data to disk, but not to abandon them.
Comment From: zuiderkwast
This is interesting. It is a little controversial because Redis an in-memory database, but it's an important problem to solve.
Normally, consumers read the oldest data, so I think the oldest and the newest data needs to in memory, but offload the data in the middle to disk?
This can be a complex feature so it needs some effort to design it, even before there can be a decision. How can it be done so it doesn't slow down the rest of Redis? Should the files be handled by a separate thread? How do the files look like? If some data in the middle of the stream is accessed, how do we make sure data is not move back and forth from disk too often?
Comment From: bnuzhouwei
In most case, the consume speed can be much slower that the produce speed, that't the value of MQ. So may the oldest data offload to disk is acceptable. It is a mush easier strategy:
- Just store two linked list, one store in memory, and the other in disk. And store the oldest message id in memory.
- When oversized push the oldest data to the linked list on disk, and update the oldest message id.
- It comes any messageid, the Redis server know load data from disk or memory.
In most case, the slow consume speed is acceptable, but losing data is a disaster. Stream is much different that that sets for caches system which need both high speed of write and read. Stream need high append speed but slow del or read may not as important as the NEVER Losing Data.
For offload middle data to disk, three linked list and two message id may can help to implement the strategy.
Comment From: madolson
I would also guess this feature will likely never be implement. The complexity of tiering off to disk is significant. I think perhaps having module APIs that hook into stream trimming could be interesting though. The code that actually tiers the stream to disk could be implemented seperately.