Following the discussion in #12192, I'd like to start a conversation about how the various ways to deploy Redis can converge in the future into a single interface that can be supported by clients universally.
Background
The simplest form of Redis deployment is a single, standalone server. For some use cases, it is enough, but when users look for better performance, memory capacity, or availability, they need a distributed deployment with multiple servers.
Today, there are two completely different options: Redis Sentinel is a standalone component that orchestrates multiple Redis instances to achieve high availability. To do that, it exposes a dedicated RESP-based interface that clients can use for service discovery.
Redis Cluster is a built-in Redis mode where multiple Redis instances self-orchestrate to form a highly available cluster. In addition to high availability, Redis Cluster provides sharding support and handles data organization in hash slots across different nodes. To do that, it implements the Redis Cluster Specification, a set of RESP commands and responses that clients need to implement.
Both Redis Sentinel and Redis Cluster provide high availability, but they don't entirely overlap:
| Item | Redis Sentinel | Redis Cluster |
|---|---|---|
| Redis Standalone Compatibility | Full | Limited: No cross-slot operations, single database |
| Orchestration | Supports more tuning and customizations: parallel-syncs, custom scripts, Lua -BUSY handling |
Basic |
| Quorum | Flexible configuration, number, and topology of Sentinels can be different than data nodes | Less flexible, primaries are the quorum[1] |
| Service Discovery | More advanced: multiple named instances, Pub/Sub event notifications | Basic |
[1] This is true for the current Redis Cluster implementation but will change with Cluster V2.
Problems
Having Redis Cluster and Redis Sentinel as two high-availability options makes little sense to users and is more challenging to maintain at the project level.
Users who start using Redis Sentinel and later need to scale up must migrate to Redis Cluster and adapt to a completely different architecture.
Client libraries must maintain two sets of commands to interact with the two deployment modes. Not all clients support both modes, which adds some complexity to users.
Having multiple modes also complicates matters when considering enhancements (such as the -REDIRECT feature discussed in #12192) that have some overlap with other Redis Cluster or Sentinel capabilities.
Proposal: Redis Cluster everywhere
We propose refreshing and extending the Redis Cluster Specification and adopting it as a universal interface that applies to all forms of Redis deployment, regardless of what the server-side implements.
At its core, the Redis Cluster Specification allows clients to interrogate server-side Redis about its topology - node endpoints, replication roles, and hash slot mapping - everything clients need to handle failovers and switchovers. We must extend it to support the additional use cases below, which are not part of Redis Cluster today.
Non-sharded Cluster
A non-sharded cluster provides high availability but no sharding - all nodes contain the same data and hash slots.
It may operate in full standalone Redis compatibility mode, which means: * Cross-slot operations are allowed * Multiple databases are supported Or, it may operate as a regular Redis Cluster, which happens to have all hash slots mapped to the same node(s).
Today, clients assume that cluster mode implies sharding. They may accept nodes assigned with all hash slots but will still perform hash slot calculation and avoid cross-slot operations. Part of the work on the specification is removing these implicit assumptions and providing a mechanism for clients to determine how the server expects them to interact with it.
Standalone Cluster Interface
Standalone Redis (with or without Redis Sentinel) should also (conditionally) expose a subset of Redis Cluster interface to clients, including:
* Use a -MOVED reply when accessing a replica in a way that is not supported locally.
* Expose replicas through CLUSTER SLOTS and CLUSTER SHARDS.
A standalone Redis instance or a non-sharded cluster should appear the same to clients.
Service Discovery Improvements
The extended Redis Cluster Specification may allow clients to register and receive topology changes events without polling or lazily waiting for -MOVED replies.
Open Questions
- When Redis Cluster becomes a superset of all other modes, do we want to deprecate and eventually end-of-life the Sentinel API, or would we instead continue to support a subset on top of Redis Cluster?
Comment From: madolson
When Redis Cluster becomes a superset of all other modes, do we want to deprecate and eventually end-of-life the Sentinel API, or would we instead continue to support a subset on top of Redis Cluster?
I think we should look into supporting the sentinel API ontop of Redis cluster. Given that it's the same API, there might be an easier way to support the "dataplane" commands for both. Given our recent conversations about clients disliking breaking changes, I think we should consider that very carefully.
The extended Redis Cluster Specification may allow clients to register and receive topology changes events without polling or lazily waiting for -MOVED replies.
Reminds me of https://github.com/redis/redis/pull/10358, which sort of did this. Maybe we can revive and finalize that discussion.
Standalone Redis (with or without Redis Sentinel) should also (conditionally) expose a subset of Redis Cluster interface to clients, including:
Just to jump on this train, we got some feedback from the engineers we have in AWS that work on clients that they dislike the way CLUSTER SHARDS is:
1. Not consistent between shards since it is ordered based on the internal hash.
2. Exposes transient metadata. This wasn't an issue we considered during the implementation, but it apparently helps to to send a CLUSTER SHARDS to multiple nodes and basically ask "Which one is the most common" by doing a strcmp, and use that one as source of truth. This is much more difficult when you have to start parsing.
Comment From: soloestoy
Standalone Redis (with or without Redis Sentinel) should also (conditionally) expose a subset of Redis Cluster interface to clients, including:
- Use a -MOVED reply when accessing a replica in a way that is not supported locally.
- Expose replicas through CLUSTER SLOTS and CLUSTER SHARDS.
A standalone Redis instance or a non-sharded cluster should appear the same to clients.
Currently, I am focusing on the redirection part, I'm happy that the non-sharded cluster keep "-MOVED", and standalone can be compatible with it.
And then in the complete reply ("-MOVED slot ip:port"), regarding the slot, should a non-sharded cluster respond with a special slot value (such as -1), or should it also calculate the slot for a particular key before returning?
I lean towards using -1. Since it is a non-sharded cluster, it is not tied to any specific slot, and we also allow cross-slot command execution, using -1 seems like a good choice.
Comment From: madolson
I lean towards using -1. Since it is a non-sharded cluster, it is not tied to any specific slot, and we also allow cross-slot command execution, using -1 seems like a good choice.
I agree that I don't think it should return the slot. We don't want to waste compute on the slot value.