Redis Redis Cluster V2 project - Nineya|java/go/python

This issue covers a couple of high level of areas for improving Redis Cluster. Ranked roughly by priority in each pillar.

Improved use case support This pillar focuses on providing improved functionality outside of the core cluster code but helps improve the usability of cluster mode.

Pubsub scaling: Messages are published into a global channel space that doesn’t follow slot conventions. The proposal is to introduce a new “pubsub local” functionality where clients direct messages to the correct nodes. Goal is to reduce write amplification. https://github.com/redis/redis/issues/8029 https://github.com/redis/redis/pull/8621 https://github.com/redis/redis/issues/3346

Clusterbus as HA for single shard: Allows the clusterbus to replace sentinel as the HA mechanism for Redis. This will require voting replicas which is dicussed later. https://github.com/redis/redis/pull/10875

#11271 The idea here is that Redis server nodes could proxy incoming requests to the desired node instead of relying on heavy client side logic to know the cluster topology. Simplifies some of the work for workloads that don’t want to maintain a heavy client. This would be an optional configuration.

Custom hashing support: Some applications want to have their own mechanism for determining slots, so we should extend the hashtag semantics to include information about what slot the request is intended for.

Hashtag scanning/atomic deletion: A common ask has been for being able to use scan like commands to find elements in a hashtag without having to scan the entire keyspace. A proposal is to be able to create a group of keys that can be atomically deleted. A secondary index could also solve this issue. (I'm sure there is an issue for this, I'll find it)

Cluster management improvements This pillar focuses around improving the ease of use for managing Redis clusters.

Hostname support: Certain applications want hostname support for SNI (this is hostname validation for TLS) and it’s apparently an ask for kubernetes. https://github.com/redis/redis/issues/2186 https://github.com/redis/redis/pull/9530

Consensus based + Atomic slot migration: Implement a server based slot migration command that migrates the data from one slot to another slot. (We have a solution we hopefully will someday post for this) https://github.com/redis/redis/issues/2807

Improved metrics for slot performance: Add metrics for individual slot performance to make decisions about hot shards/keys. ** This makes it easier to identify slots that should be moved. Easy metrics to grab our key accesses, ideally memory would be better but that's hard.

Dynamic slot ownership For all master clusters in caching based used cases, its data durability is not needed and nodes in a cluster can simply take over slots from other nodes when a node dies. Adding nodes can also mean that it will automatically takeover slot ownership from other nodes. https://github.com/redis/redis/issues/4160

Auto scaling Support automatic rebalancing of clusters when adding nodes/removing nodes as well as during steady state when there is traffic load mismatch. https://github.com/redis/redis/issues/3009

Moving cluster bus to a separate thread, improved reliability in case of busy server Today if the main thread is busy it main not respond to a health check ping even though it is still up and healthy. Refactoring the clusterbus onto its own thread will make it more responsive.

Refactor abstractions in cluster.c: Several abstractions in cluster.c are hard to follow and should be broken up including: Cluster bus and node handling, slot awareness, health monitoring.

Human readable names for nodes: Today individual Redis nodes report their hexadecimal names, which are not human readable. Instead we should additionally assign them some more readable name that is either logical or corresponds to their primary. https://github.com/redis/redis/pull/9564

Gossiped node deletion Typically you need to send a cluster forget to each node in a cluster to delete a node. If you don't do this fast enough, the node will be re-added through gossip. Ideally you just need to forget a node and it will eventually be forgotten throughout the cluster. https://github.com/redis/redis/pull/10875

*Module support for different consensus algorithms * Today Redis only supports the clusterbus as a consensus algorithm, but we could also support module hooks for other forms of consensus.

Cluster HA improvements This pillar focuses on improving the high availability aspects of Redis cluster and focuses around improving failover and health checks.

Reduce messages sent for node health decisions: The Redis clusterbus has an NxN full mesh of gossip health messages. This can cause performance degradation and instability in large clusters as health and voting authorization is slow. There are several ways to solve this such as having failovers be shard local or being smarter about propagation of information. https://github.com/redis/redis/issues/3929

Voting replicas: (group this with other conensus ones) Today replicas don’t take part in leader election, this would be useful for smaller cluster sizes especially single shards. https://github.com/redis/redis/pull/10875

Avoiding cascading failovers leading to data loss: It's possible that a replica without data can be promoted to be the master role and lost all data in the shard. This is typically the result of a cascading failover. Ideally we should add a stopgap here to prevent this last node from being demoted.

Placement awareness: Today the individual nodes have no concept of how they are placed compared to each other, and will happily allow all the primaries to exist in the same zone. This also may include the notion of multi-region awareness.

RESPV3 topology updates Today clusters come to learn about topology changes when they send a request to the wrong node. This can be limited by having nodes proactively notify clients when a topology change has occurred. This can be inefficient since today clients need to call CLUSTER SLOTS to re-learn the entire topology. A client can opt into topology changes, and from that point on it will receive information about just what topology has changed. https://github.com/redis/redis/issues/10150

Comment From: iakkus

For 'human-readable names', I guess the basic idea is similar to the way docker assigns names to started containers (unless one is given by the user).

I think it would be useful: - to let users assign 'aliases' to nodes, - to have replicas be named after their primaries (preferably, in a transparent and automated fashion), and - to let users create pools for the random names picked for the nodes.

Comment From: dmitrypol

here are a few more ideas: - Better integration story for Redis / Cluster / Sentinels.
- Integrate Sentinel support into Redis. Make it easier to do failovers w/o needing Sentinels (not really cluster related but similar). - Support multiple databases in Cluster - When Redis Cluster mode does a failover send a PubSub message just like Sentinel does. This way in case of hardware failure someone will be notified. Right now you have to ping Redis Cluster asking for it's health.
- Better user experience for setting up cluster via redis-cli.

Comment From: zuiderkwast

+1 for builtin failover/sentinel
Use one db for cluster (db 0 like now), other db numbers for non-cluster (e.g. local cache for colocated app server)

Comment From: hwware

i have one more idea for cluster adding slot and deleting slot: current, we can only add slot or delete slot for individual slot, such as cluster addslots 1 2 3 .... 5000 If we want to add multiply slots in range, we need use bash shell, I think we could add command like: cluster addslots -r 1 5000 which means we add slots from 1 to 5000 we could implement similar command for delete slots

Comment From: zuiderkwast

Does "Gossiped node deletion" involve a timed blacklist as described in #1410?

Comment From: chenyang8094

Request proxying: The idea here is that Redis server nodes could proxy incoming requests to the desired node instead of relying on heavy client side logic to know the cluster topology. Simplifies some of the work for workloads that don’t want to maintain a heavy client. This would be an optional configuration.

It seems to be somewhat related to my issue #10307

Comment From: judeng

Use one db for cluster (db 0 like now), other db numbers for non-cluster (e.g. local cache for colocated app server)

I don't know the history of the cluster very well. Why does the cluster not support the multi-dbs mode, are there any difficulties in technical implementation? Has it been discussed in our community?

Comment From: judeng

@madolson Thanks for answering! in my scenarios, a cluster would be shared in multiple callers, and using multiple db could reduce the dict overhead, I'd like to try it.

Comment From: judeng

hi everyone, Any update with the ClusterV2? Could we replace the gossip in 8.0?