Redis [BUG] ASK redirection from replica nodes

Hello,

I'm implementing a client for redis cluster in ruby. https://github.com/redis-rb/redis-cluster-client

I'm trying to test the client under resharding and scale reading conditions. But it seems that replica nodes don't reply ask-redirection error. Clients receive nil from replica nodes while resharding. Is there a way to obtain values of keys in the middle of resharding from replica nodes correctly?

I'm testing with redis 7 docker image.

Comment From: zuiderkwast

Let me see if I understand the problem.

 +-------------+     migrating    +------------+
 |  master 1   | ---------------->|  master 2  |
 +-------------+                  +------------+
       |                                |
       v                                v
 +-------------+                  +-------------+
 |  replica 1  |                  |  replica 2  |
 +-------------+                  +-------------+

Master 1 is migrating a slot to master 2.

Some keys are already migrated and deleted from master 1. If a client requests them from master 1, master 1 will reply with an ASK redirect to master 2. So far so good.

If replica 1 (replica of master 1) receives a read command from a client about a key which has already been migrated to master 2, you expect replica 1 to reply with ASK redirect to master 2, but instead it replies with nil. Correct?

This is what I think happens:

Replica 1 doesn't know that the master is migrating some slot to another master, because this information is not propagated to replicas. Master 1 simply replicates a DEL command to the replica 1 when the key has been migrated and the replica thinks that the key was simply deleted.

If the replica would know about migrating and importing slots, it would be possible for the replica to reply with an ASK redirect. I think it would be solved #10517, which propagates the SETSLOT command to replicas. Perhaps with some extra code and test case for this scenario.

@PingXie it would be interesting to know what you are thinking about this scenario.

Comment From: PingXie

@zuiderkwast is spot-on. Replicas are unaware of the migration today. With #10517, this scenario can be supported. The simplest form of redirection could have the source replica redirecting the request to the target primary and this could be a reasonable starting point. Operation-wise though, this might not be the best idea because we will be adding more load on the target primary, when the goal of read replicas is the opposite. So ultimately I think what needs to happen is to have the source replica pick a random replica in the target shard and redirect the traffic there.

Comment From: zuiderkwast

Good point @PingXie. Picking a random replica of the destination shard is a very sensible choice for an ASK redirect by a replica.

Comment From: supercaracal

Thank you for clarifying the issue.

If replica 1 (replica of master 1) receives a read command from a client about a key which has already been migrated to master 2, you expect replica 1 to reply with ASK redirect to master 2, but instead it replies with nil. Correct?

Yes, I do.

So ultimately I think what needs to happen is to have the source replica pick a random replica in the target shard and redirect the traffic there.

I think so too.

Comment From: zuiderkwast

@madolson Is this solved?

I think it is a bug that needs to be solved.

Comment From: madolson

It was just a question that had been answered. I suppose we can switch this to the authority on the issue. There was a separate issue that ping was working on that had an associated PR.

Comment From: zuiderkwast

Good. I haven't seen any other issue mention this problem.

It makes reading from readonly replicas unreliable during slot migration, so reading from readonly replicas is broken in this sense.

Comment From: zuiderkwast

It was discussed above that a replica should ASK-redirect to a random replica in the target shard. I think it's better that it ASK-redirects to the target master instead, for two reasons:

MOVED-redirects are always to the master for the slot, so ASK should be aligned with that.
A replica in the target shard may not yet have received the data during ongoing migration. The master is more likely to have the data.