Redis [NEW] Custom Slot Hashing Mechanism

The problem/use-case that the feature addresses

Currently in Redis Cluster mode there are fixed number of slots (16384) and the hashing mechanism for the slot is also hardcoded to crc16(slot) % 16384. The fixed slot hashing mechanism is restrictive to certain application(s) which want to have control over data partitioning across the cluster.

This would be beneficial for the client to be agnostic about the hashing algorithm and colocate data in a slot/shard.

Description of the feature

Provide module API to register/unregister callback to compute the hash of the slot.

void RM_RegisterCustomSlotHash(RedisModuleCtx *ctx, RedisModuleSlotHashCallback callback);

Callback structure (It returns the slot for the provided key)

int (*RedisModuleSlotHashCallback) (RedisModuleCtx *ctx, RedisModuleString *key);

Unregister callback

void RM_UnregisterCustomSlotHash(RedisModuleCtx *ctx);

Considerations:

Custom hash slot mechanism should only be enabled if the keyspace is empty else request will be rejected by the engine.
The onus is on the administrator/client to load the module on all the nodes.
Only a single module can register a callback.
If a callback is registered, it needs to be unregistered first to update the slot hashing mechanism via current/other module.
Config custom-hash-slot-enabled will be set to true if this feature is used. Ease client discoverability.
Based on the above config, Clients (redis-cli) can invoke CLUSTER KEYSLOT to determine the slot.
Add assert on the engine to restrict the slot range to be within [0,16384). The server code/cli/benchmark assumes the maximum number of slots to be 16384.

Alternatives you've considered

Hashtag mechanism enables some of the use case but doesn't provide complete control to the client/application.

Additional information

Previous discussion/implementation: #9604 , #8948 , #3919

Comment From: ranshid

I like the idea of letting modules decide on the key to slot mapping. Since we are talking about modules, we need to take care that all the cluster nodes will agree on the same hashing methodology. For example lets say we have 2 modules a and b attempting to register mappings A() nd B() once cluster node is loading. in case on some nodes module a will register method A() and some will have module b register mapping B(), we could potentially have an endless MOVED ping pong (right?) maybe we can think of some way we can verify this is synced across the cluster? Or we can just document it and have the top orchestration make sure the cluster is synced on the same hashing method.

Probably importent to provide a clear way for the mapping method to be reported in redis Info

Comment From: itamarhaber

Another pov: how would clients pickup this "custom" hashing? I dislike relying on "-MOVED".

Comment From: hpatro

@ranshid Since we are talking about modules, we need to take care that all the cluster nodes will agree on the same hashing methodology. For example lets say we have 2 modules a and b attempting to register mappings A() nd B() once cluster node is loading. in case on some nodes module a will register method A() and some will have module b register mapping B(), we could potentially have an endless MOVED ping pong (right?) maybe we can think of some way we can verify this is synced across the cluster?

This is a issue with any kind of config/acl/function setup in Redis cluster mode. All of our solution expects the orchestration layer/admin user to set it up correctly. I think we have to continue in the same direction in this case as well.

Comment From: yossigo

Hashtag mechanism enables some of the use case but doesn't provide complete control to the client/application.

@hpatro Can you elaborate on this one? If it's a matter of convenience, we could consider a new type of tag that explicitly binds to a specific hash slot. But I wonder if that's the missing piece you're referring to.

Comment From: hpatro

@hpatro Can you elaborate on this one? If it's a matter of convenience, we could consider a new type of tag that explicitly binds to a specific hash slot. But I wonder if that's the missing piece you're referring to.

This was more of a generic statement, that we don't allow the customer to decide how to distribute the data across the cluster. #3919 claims the current algorithm might not be evenly distributing the data and would like to have control over it.

Comment From: madolson

If it's a matter of convenience, we could consider a new type of tag that explicitly binds to a specific hash slot. But I wonder if that's the missing piece you're referring to.

This is what I've been advocating for in the past. We can introduce a V2 slot convention, that clients can configure. Ideally I think this should be more standardized though, and not part of the module API. I've considered suggesting using 3 alpha numeric characters to represent the 2^14 slots. We could introduce a new syntax like [aB3], where aB3 uniquely identifies a slot. If we wanted invisible characters, it would bring it down to 2 characters, but I think that would hurt key readability.

This also allows us to reserve some "special" slots that aren't normally reachable. Example, "$R1" could refer to a reserved redis keyspace that is used for streams that is local to the given node.