bin/redis-cli_cluster -h 10.8.132.236 -p 7000 --cluster reshard 10.8.132.238:7000 --cluster-from 50b9b79e5f3d0895c206d37a3dfebbae7796401c --cluster-to a27c9e8669cc08f2dd6c5b47bf042b2cf9af4bc6 --cluster-slots 2731 --cluster-yes
Moving slot 12282 from 10.8.132.238:7000 to 10.8.132.239:7000: .................................................................................................................................................................................... Node 10.8.132.238:7000 replied with error: CROSSSLOT Keys in request don't hash to the same slot
Redis server v=4.0.12 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=d83f223f9092475c
redis-cli 5.0.3
Comment From: antirez
There could be several reasons for this. First thing to do is to upgrade the redis-cli to latest version of the 5.0 branch instead of using 5.0.3. Let's see if there was a bug in redis-cli.
Comment From: vacheli
I update the redis-cli to latest version, and thie result is same. [root@localhost redis_4.12]# ./redis-cli -v redis-cli 5.0.9
I try to use ./redis-cli --cluster fix it, then run --cluster reshard agin. Still can't solve it.
Comment From: antirez
@vacheli This is extremely odd, it's like if GETKEYSINSLOT would return keys about different slots. If you want to fix it, maybe you can try with --cluster-pipeline 1, even if this should not be needed in theory. So that we force processing of one key after the other. However if we want to understand what the issue is, I can modify redis-cli in order to print more messages. What do you like more?
Comment From: vacheli
./redis-cli -h 10.8.132.236 -p 7000 --cluster reshard 10.8.132.238:7000 --cluster-from 50b9b79e5f3d0895c206d37a3dfebbae7796401c --cluster-to 18d93ed19445e83003ce653387c673c3e00ed5d6 --cluster-slots 2731 --cluster-yes --cluster-pipeline 1
[root@localhost redis_4.12]# ./redis-cli -h 10.8.132.238 -p 7000 info memory
Memory
used_memory:4961612760 used_memory_human:4.62G used_memory_rss:6877581312 used_memory_rss_human:6.41G used_memory_peak:6716644208 used_memory_peak_human:6.26G used_memory_peak_perc:73.87% used_memory_overhead:826644352 used_memory_startup:1458464 used_memory_dataset:4134968408 used_memory_dataset_perc:83.36% total_system_memory:33568223232 total_system_memory_human:31.26G used_memory_lua:93184 used_memory_lua_human:91.00K maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction mem_fragmentation_ratio:1.39 mem_allocator:jemalloc-4.0.3 active_defrag_running:0 lazyfree_pending_objects:0
It‘s been running for more than 24 hours,and it's not finished yet, then I killed it manually. I hope the redis-cli can print more detailed error information for troubleshooting.
Comment From: vacheli
[root@localhost redis_4.12]# ./redis-cli -h 10.8.132.236 -p 7000 --cluster fix 10.8.132.236:7000
10.8.132.236:7000 (339cf60b...) -> 13467203 keys | 4096 slots | 1 slaves.
10.8.132.238:7001 (599d6c7d...) -> 17483474 keys | 5312 slots | 1 slaves.
10.8.132.238:7000 (50b9b79e...) -> 13496021 keys | 4102 slots | 1 slaves.
10.8.132.239:7000 (5dffbb4a...) -> 9454420 keys | 2874 slots | 0 slaves.
[OK] 53901118 keys in 4 masters.
3289.86 keys per slot on average.
>>> Performing Cluster Check (using node 10.8.132.236:7000)
M: 339cf60b195f68c13ffb4a7587ccf4f1a74a2eea 10.8.132.236:7000
slots:[1365-5460] (4096 slots) master
1 additional replica(s)
M: 599d6c7dc3c7372559fe6e428fbc2a86fdc8ba0e 10.8.132.238:7001
slots:[5611-10922] (5312 slots) master
1 additional replica(s)
M: 50b9b79e5f3d0895c206d37a3dfebbae7796401c 10.8.132.238:7000
slots:[12282-16383] (4102 slots) master
1 additional replica(s)
S: 0b8c238c3fceb6bed9603371d071bdccde54537d 10.8.132.237:7000
slots: (0 slots) slave
replicates 599d6c7dc3c7372559fe6e428fbc2a86fdc8ba0e
S: a7f53a14c54b00c317d85c8f1d0329fd9615191f 10.8.132.237:7001
slots: (0 slots) slave
replicates 339cf60b195f68c13ffb4a7587ccf4f1a74a2eea
S: 8a6ba7fb45761b5afdf4c2c8abb6c813bf0ea05e 10.8.132.236:7001
slots: (0 slots) slave
replicates 50b9b79e5f3d0895c206d37a3dfebbae7796401c
M: 5dffbb4a524d1faafca1aa5981ef07719b32f594 10.8.132.239:7000
slots:[0-1364],[5461-5610],[10923-12281] (2874 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
[WARNING] Node 10.8.132.238:7001 has slots in migrating state 5611.
[WARNING] Node 10.8.132.239:7000 has slots in importing state 5611.
[WARNING] The following slots are open: 5611.
>>> Fixing open slot 5611
*** Found keys about slot 5611 in non-owner node 10.8.132.239:7000!
Set as migrating in: 10.8.132.238:7001
Set as importing in: 10.8.132.239:7000
>>> Nobody claims ownership, selecting an owner...
*** Configuring 10.8.132.238:7001 as the slot owner
>>> Case 2: Moving all the 5611 slot keys to its owner 10.8.132.238:7001
Moving slot 5611 from 10.8.132.239:7000 to 10.8.132.238:7001: ....................................................................................................
>>> Setting 5611 as STABLE in 10.8.132.239:7000
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@localhost redis_4.12]# ./redis-cli -h 10.8.132.236 -p 7000 --cluster rebalance 10.8.132.239:7000 --cluster-use-empty-masters
>>> Performing Cluster Check (using node 10.8.132.239:7000)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Rebalancing across 4 nodes. Total weight = 4.00
Moving 1216 slots from 10.8.132.238:7001 to 10.8.132.239:7000
Node 10.8.132.238:7001 replied with error:
CROSSSLOT Keys in request don't hash to the same slot
Comment From: antirez
@vacheli yes --cluster-pipeline 1 can be very slow with large clusters, so it's not normally used. I'm going to modify redis-cli in Redis 6 in order to print more information when something like that happens, so that if you can try again with such new version of Redis-cli I'll release, we can have more info about what is happening there. Please could you also send me CLUSTER NODES output for all your nodes?
Comment From: vacheli
Comment From: vacheli
I've added a new 10.8.132.239 master node, and I want to expand the capacity horizontally.
Comment From: vacheli
Wow, That can normally rebalance in redis-cli 6.0.1. Also I need to test other production environments.
[root@localhost sauser]# ./redis-cli -h 10.8.132.236 -p 7000 --cluster rebalance 10.8.132.239:7000 --cluster-use-empty-masters
>>> Performing Cluster Check (using node 10.8.132.239:7000)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Rebalancing across 4 nodes. Total weight = 4.00
Moving 1366 slots from 10.8.132.237:7000 to 10.8.132.239:7000
######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################
Moving 1365 slots from 10.8.132.236:7000 to 10.8.132.239:7000
#####################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################
Moving 1365 slots from 10.8.132.238:7000 to 10.8.132.239:7000
#####################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################
Comment From: bsergean
Hey @vacheli,
If you feel adventurous ... I have a python cluster tool which can print the state of each node in the cluster. It would tell here if some nodes have a different view of the cluster.
rcc -v cluster-check -r redis://10.8.132.236:7000 for your use-case.
Each node gets a signature, and the check is successful if all nodes have the same signature. Here I have 2 runs, the first one is while a cluster is being setup, so some nodes are still not replicas, so the check fails. Once the cluster is properly setup the check succeed.
(I think it's close to what redis does when it says: All nodes agree about slots configuration, but the -v verbosity option helps seeing what's inconsistent in the cluster).
```(venv) rcc$ rcc -v cluster-check 2020-05-07 17:43:29 INFO redis://127.0.0.1:11000 c3059e787d979632cac7cc8ddad9605e balanced False coverage True 2020-05-07 17:43:29 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 master ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:29 INFO redis://127.0.0.1:11001 9f8a64979db7ce7b1b2f3d5fdd5a6830 balanced False coverage True 2020-05-07 17:43:29 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 master 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:29 INFO redis://127.0.0.1:11002 c3059e787d979632cac7cc8ddad9605e balanced False coverage True 2020-05-07 17:43:29 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 master ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:29 INFO redis://127.0.0.1:11003 12976ed6fcf41d72ef77cec7bafa09c5 balanced True coverage True 2020-05-07 17:43:29 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:29 INFO redis://127.0.0.1:11004 835aa6f6f6b8a5c0bb32a401435b5071 balanced False coverage True 2020-05-07 17:43:29 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 master
2020-05-07 17:43:29 INFO redis://127.0.0.1:11005 c3059e787d979632cac7cc8ddad9605e balanced False coverage True 2020-05-07 17:43:29 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 master ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:29 INFO 4 unique signatures cluster unhealthy. Re-run with -v (venv) rcc$ (venv) rcc$ (venv) rcc$ (venv) rcc$ (venv) rcc$ (venv) rcc$ (venv) rcc$ (venv) rcc$ (venv) rcc$ rcc -v cluster-check 2020-05-07 17:43:42 INFO redis://127.0.0.1:11000 12976ed6fcf41d72ef77cec7bafa09c5 balanced True coverage True 2020-05-07 17:43:42 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:42 INFO redis://127.0.0.1:11001 12976ed6fcf41d72ef77cec7bafa09c5 balanced True coverage True 2020-05-07 17:43:42 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:42 INFO redis://127.0.0.1:11002 12976ed6fcf41d72ef77cec7bafa09c5 balanced True coverage True 2020-05-07 17:43:42 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:42 INFO redis://127.0.0.1:11003 12976ed6fcf41d72ef77cec7bafa09c5 balanced True coverage True 2020-05-07 17:43:42 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:42 INFO redis://127.0.0.1:11004 12976ed6fcf41d72ef77cec7bafa09c5 balanced True coverage True 2020-05-07 17:43:42 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:42 INFO redis://127.0.0.1:11005 12976ed6fcf41d72ef77cec7bafa09c5 balanced True coverage True 2020-05-07 17:43:42 INFO 62087639a34f6c9aeb16b95f56bd77eb34fa9927 127.0.0.1:11000 master 0-5460 92491c5755e4f44da4586219ceef3b8353a69b18 127.0.0.1:11001 master 5461-10922 9c6ecc2da8bedbf0a068a1d03c6af587b91571e6 127.0.0.1:11002 master 10923-16383 58cf0bd099b0c955ed037fc3f95ed4c3386bd434 127.0.0.1:11003 slave 942809c40ba26d12ed8ff695d877713184600951 127.0.0.1:11004 slave ab7d0668703bac23481583bf4c672a6dee5645f1 127.0.0.1:11005 slave
2020-05-07 17:43:42 INFO 1 unique signatures cluster ok
There's some doc [here](https://machinezone.github.io/rcc/moving_slots/) for a moving slots command, which *does not handle busykey if keys exists in both nodes*, so maybe use it on a scratch/toy cluster. There is a --dry option.
Resharding is a probably a good place to use ACL (@antirez) / so that only some admin users can do it.
You can install it with:
curl -sL https://raw.githubusercontent.com/machinezone/rcc/master/tools/install.sh | sh ```