Hi. I had this problem during online resharding, so share it.
We are using a redis cluster and the version is using 4.0.10. And the client library is using lettuce 5.1. A few days ago, we did an online resharding of our redis cluster, which occasionally caused MOVED errors, due to a redirection exceeding lettuce's maxRedirect limit of 5.
https://github.com/lettuce-io/lettuce-core/wiki/Redis-Cluster#refreshing-the-cluster-topology-view
I investigated the cause because it was strange that the redirection occurred five times, and I found it could be a problem in the move_slot of the redis-trib. After moving all the data in the slot, when running CLUSTER SETSLOT NODE on all the nodes in the cluster, it is possible that MOVED errors cause because the order is not guaranteed.
https://github.com/antirez/redis/blob/ff6db5f176caa426135bf9e603a1eb6f6dea4f83/src/redis-trib.rb#L1086
redis-trib was replaced by redis-cli after Redis 5, but it still seems to work the same way.
Thank you.
Worst Case
When migrating slots using redis-trib, the worst-case scenario is that MOVED errors can be repeated indefinitely in step 4. https://redis.io/commands/cluster-setslot#redis-cluster-live-resharding-explained
MIGRATINGstate gets cleared on source-node. a. From now on, this node will no longer be the owner of the slot. b. This means that it will no longer be able to process commands from clients and cause a MOVED error, not ASK. c. Also, since theis still not the slot owner, requests without ASKINGcause a MOVED error. d. b and c can be repeated infinitely.- Notice that the slot owner has been changed for all nodes except
and destination-node. IMPORTINGstate gets cleared on. - From now on, the
node is the owner of the slot, and it can handle commands for that slot without ASKINGcommand.
Work Around
Ensure that the sequence in which CLUSTER SETSLOT should be executed is as follows.
1. destination-node
2. source-node
3. The nodes not involved in the resharding
Comment From: ma2sql
@zuiderkwast Thank you!