We are running Redis in cluster mode with 6 pods and with persistence enabled. All the six pods are masters and there are no secondaries replicating the data.
On of the set-ups I run
redis-cli --verbose --cluster check localhost:6379
It shows that
Node <ip1:port> has slots in importing state <slots list here>
Node <ip2:port> has slots in importing state <slots list here>
[WARNING] The following slots are open: <slots list here>
All 16384 slots are covered.
I am a bit confused about the state of the cluster and what can we make out of it?
- Why would there be "IMPORTING" state and no corresponding "MIGRATING" state for those slots.
- It says all 16384 slots are covered but at the same time mentioned a set of open slots. Both those statements sound contradictory to each other to me. Or not? What is the difference between being covered and being open ?
- Apart from the above, I also see that in redis-cli cluster info, it shows all 16384 slots are covered but known nodes is 6 but cluster size is 4.
- According to Redis Cluster is a node in known state but not accounted in cluster_size until it has a single slot that that is not in migration nor in import state ?
What are some situations due to which we get into the above state? Is there a chance that it can happen because of failed **rebalance**operation ?
Thanks in advance.
I tried doing redis-cluster fix with replace option after which the problems got fixed but still want to understand this correctly.
Comment From: madolson
- The most common case I've seen this issue is when a node comes up with an RDB data that has "some" keys for slots that it does not own. In this scenario, the node will be marked as "importing" said information. Since data is distributed in cluster mode, there is no authoritative owner of data.
- My understanding is that the list of open slots includes those that are in migrating or importing state. So that is consistent.
- Cluster size is number of masters.
- ^
The state could happen because of a failed rebalance as well, but that seems less likely because you would expect the corresponding migrating states to also exist. I'm not sure redis-cluster fix would have resolved the issues.
Comment From: zuiderkwast
- As long as a slot has an owner, it is counted as covered. Slots in migrating/importing state are owned by the migrating node until the slot is fully migrated, then the ownership is transferred. Only slots which have no owner at all are uncovered.
Comment From: vineelyalamarthy
@zuiderkwast So here we have all 16384 slots covered but at the same time so many slots are shown as open and couple of pods/nodes have been shown as multiple slots in the IMPORTING state.
@antirez any idea when this happens.
Comment From: vineelyalamarthy
- The most common case I've seen this issue is when a node comes up with an RDB data that has "some" keys for slots that it does not own. In this scenario, the node will be marked as "importing" said information. Since data is distributed in cluster mode, there is no authoritative owner of data.
- My understanding is that the list of open slots includes those that are in migrating or importing state. So that is consistent.
- Cluster size is number of masters.
- ^
The state could happen because of a failed rebalance as well, but that seems less likely because you would expect the corresponding migrating states to also exist. I'm not sure redis-cluster fix would have resolved the issues.
Yes @madolson we have automated rebalance triggered by the Redis Leader pod (selected via lease based Leader Election using etcd on Kubernetes). In those cases when Rebalance times-out , corresponding MIGRATE states also exist along with IMPORT.
But having only IMPORTING thing is a little weird.
This theory needs to be verified. I will test few times and update here.
It can happen when one node goes (A) down and slots are migrated to B and C , but keys are not. And then those keys get created on the new B and C. But lets say A comes online after hours . But due to persistence enables and loads RDB into memory and then perhaps we can have Orhpaned IMPORT statements. Remedy seems to be to run fix command with and on a case by case basis, it assigns the slot ownership.
It was also asking me to provide some additional parameters to the cluster fix command.