I got error when add slave node to existing cluster. (happen randomly when I add node one-by-one). After than, the new coming node (i.e. 10.229.35.245:10002 in the below example) become a empty master node
I guess redis-trib.rb should handle the error (and retry) during CLUSTER MEET.
version: redis-3.0.0-rc2 + redis-3.0.2
steps: - prepare existing redis cluster (redis-3.0.0-rc2) - add slave node one-by-one (scripted) - sometimes the above-mentioned error occurs
commands to add slave nodes
$ redis-trib.rb add-node --slave 10.229.35.245:10001 10.168.123.120:10001
...... add node successfully ......
$ redis-trib.rb add-node --slave 10.229.35.245:10002 10.168.123.120:10002
>>> Adding node 10.229.35.245:10002 to cluster 10.168.123.120:10002
Connecting to node 10.168.123.120:10002: OK
Connecting to node 10.150.82.149:10001: OK
Connecting to node 10.101.188.26:10004: OK
Connecting to node 10.146.171.150:10002: OK
Connecting to node 10.101.188.26:10001: OK
Connecting to node 10.146.171.150:10001: OK
Connecting to node 10.101.188.26:10002: OK
Connecting to node 10.150.82.149:10003: OK
Connecting to node 10.168.123.120:10004: OK
Connecting to node 10.168.123.120:10003: OK
Connecting to node 10.168.123.120:10001: OK
Connecting to node 10.229.35.245:10001: OK
Connecting to node 10.146.171.150:10004: OK
Connecting to node 10.150.82.149:10002: OK
Connecting to node 10.101.188.26:10003: OK
Connecting to node 10.146.171.150:10003: OK
Connecting to node 10.150.82.149:10004: OK
>>> Performing Cluster Check (using node 10.168.123.120:10002)
M: a5c82c2d9d794d0845d61e7c1f15f056b8b82a9c 10.168.123.120:10002
slots:10240-12287 (2048 slots) master
1 additional replica(s)
M: 8a0bcfeee13a5a307066caa853b755b14f944f28 10.150.82.149:10001
slots:8192-10239 (2048 slots) master
1 additional replica(s)
S: 08dadbc10a977b9c73bbb78bdc2f2962a96cc727 10.101.188.26:10004
slots: (0 slots) slave
replicates 25621d1402c3e980c4fd2708e90b2e46159be406
S: be5398fe9f4ba172e1cb503cbf2a88f638f658f7 10.146.171.150:10002
slots: (0 slots) slave
replicates a5c82c2d9d794d0845d61e7c1f15f056b8b82a9c
M: a0bd9f5e8da9d34d93a066852a123dd2d6a06af5 10.101.188.26:10001
slots:2048-4095 (2048 slots) master
1 additional replica(s)
M: 57dd45a22dc6c34bf89f535644552787e3ea8719 10.146.171.150:10001
slots:4096-6143 (2048 slots) master
1 additional replica(s)
S: ccd85785fa33cd6c2794238a2a694f725025e117 10.101.188.26:10002
slots: (0 slots) slave
replicates 8a0bcfeee13a5a307066caa853b755b14f944f28
S: 5ca6785c800305cc9a073948895b13c8fb4c373e 10.150.82.149:10003
slots: (0 slots) slave
replicates 57dd45a22dc6c34bf89f535644552787e3ea8719
M: 25621d1402c3e980c4fd2708e90b2e46159be406 10.168.123.120:10004
slots:12288-14335 (2048 slots) master
1 additional replica(s)
S: d55ce464a99d39f1191de766cca3e2db9ed4f78b 10.168.123.120:10003
slots: (0 slots) slave
replicates 0e73a08bb6f11a21ab12a2cb6c1537d30665cb6b
S: 458db384c1a02ec187645bbcae34656872e6daa6 10.168.123.120:10001
slots: (0 slots) slave
replicates beece73332a96ea7aaa9eb571657db3c884fdc07
S: d40dbceef1e1f8f741d3051c57bc73ffbe0aa74c 10.229.35.245:10001
slots: (0 slots) slave
replicates 0e73a08bb6f11a21ab12a2cb6c1537d30665cb6b
M: 0e73a08bb6f11a21ab12a2cb6c1537d30665cb6b 10.146.171.150:10004
slots:6144-8191 (2048 slots) master
2 additional replica(s)
M: beece73332a96ea7aaa9eb571657db3c884fdc07 10.150.82.149:10002
slots:14336-16383 (2048 slots) master
1 additional replica(s)
M: f5c3b99b37fcb0b9a177ee1f13e765931dfac8b8 10.101.188.26:10003
slots:0-2047 (2048 slots) master
1 additional replica(s)
S: d258d29aa83df4e5a36f363cbe71c70a776671f8 10.146.171.150:10003
slots: (0 slots) slave
replicates f5c3b99b37fcb0b9a177ee1f13e765931dfac8b8
S: b68c208c9daf8d320c9901bc13f3a135bea3a56c 10.150.82.149:10004
slots: (0 slots) slave
replicates a0bd9f5e8da9d34d93a066852a123dd2d6a06af5
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Automatically selected master 10.168.123.120:10002
Connecting to node 10.229.35.245:10002: OK
>>> Send CLUSTER MEET to node 10.229.35.245:10002 to make it join the cluster.
Waiting for the cluster to join../var/lib/gems/1.9.1/gems/redis-3.2.1/lib/redis/client.rb:113:in `call': LOADING Redis is loading the dataset in memory (Redis::CommandError)
from /var/lib/gems/1.9.1/gems/redis-3.2.1/lib/redis.rb:2556:in `block in method_missing'
from /var/lib/gems/1.9.1/gems/redis-3.2.1/lib/redis.rb:37:in `block in synchronize'
from /usr/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
from /var/lib/gems/1.9.1/gems/redis-3.2.1/lib/redis.rb:37:in `synchronize'
from /var/lib/gems/1.9.1/gems/redis-3.2.1/lib/redis.rb:2555:in `method_missing'
from /usr/local/bin/redis-trib.rb:262:in `get_config_signature'
from /usr/local/bin/redis-trib.rb:522:in `block in is_config_consistent?'
from /usr/local/bin/redis-trib.rb:521:in `each'
from /usr/local/bin/redis-trib.rb:521:in `is_config_consistent?'
from /usr/local/bin/redis-trib.rb:529:in `wait_cluster_join'
from /usr/local/bin/redis-trib.rb:1021:in `addnode_cluster_cmd'
from /usr/local/bin/redis-trib.rb:1345:in `<main>'
$ redis-trib.rb add-node --slave 10.229.35.245:10003 10.168.123.120:10003
...... continue to add slave nodes ......
redis-cli cluster node (partial)
$ redis-cli -p 10001 cluster nodes | sort -k 2 | grep 10.229.35.245
d40dbceef1e1f8f741d3051c57bc73ffbe0aa74c 10.229.35.245:10001 slave 0e73a08bb6f11a21ab12a2cb6c1537d30665cb6b 0 1437635318148 18 connected
3ae1046601e3a8f0e706c1f45dcb690ce9e48c33 10.229.35.245:10002 master - 0 1437635314143 0 connected
377389198c176e121b1997a38c47a5a096d55f69 10.229.35.245:10003 slave 3ae1046601e3a8f0e706c1f45dcb690ce9e48c33 0 1437635312639 0 connected
712af4d9c82b8c54e95f5ba27abe1866793083b9 10.229.35.245:10004 slave 25621d1402c3e980c4fd2708e90b2e46159be406 0 1437635317647 32 connected
redis.log on 10.229.35.245:10002
6089:M 23 Jul 02:21:41.930 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.0.2 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 10002
| `-._ `._ / _.-' | PID: 6089
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
6089:M 23 Jul 02:21:41.931 # Server started, Redis version 3.0.2
6089:M 23 Jul 02:21:41.931 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis.
To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting af
ter a reboot. Redis must be restarted after THP is disabled.
6089:M 23 Jul 02:21:41.931 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
6089:M 23 Jul 02:21:41.931 * The server is now ready to accept connections on port 10002
6089:M 23 Jul 02:21:42.338 # User requested shutdown...
6089:M 23 Jul 02:21:42.338 * Saving the final RDB snapshot before exiting.
6089:M 23 Jul 02:21:42.343 * DB saved on disk
6089:M 23 Jul 02:21:42.343 * Removing the pid file.
6089:M 23 Jul 02:21:42.343 # Redis is now ready to exit, bye bye...
6211:M 23 Jul 02:22:00.813 * Increased maximum number of open files to 10032 (it was originally set to 1024).
6211:M 23 Jul 02:22:00.813 * No cluster configuration found, I'm 3ae1046601e3a8f0e706c1f45dcb690ce9e48c33
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.0.2 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in cluster mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 10002
| `-._ `._ / _.-' | PID: 6211
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
6211:M 23 Jul 02:22:00.816 # Server started, Redis version 3.0.2
6211:M 23 Jul 02:22:00.816 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
6211:M 23 Jul 02:22:00.816 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
6211:M 23 Jul 02:22:00.816 * The server is now ready to accept connections on port 10002
6211:M 23 Jul 02:23:15.660 # IP address for this node updated to 10.229.35.245
6211:M 23 Jul 02:23:20.648 # Cluster state changed: ok
6211:M 23 Jul 02:23:31.049 * Slave 10.229.35.245:10003 asks for synchronization
6211:M 23 Jul 02:23:31.049 * Full resync requested by slave 10.229.35.245:10003
6211:M 23 Jul 02:23:31.049 * Starting BGSAVE for SYNC with target: disk
6211:M 23 Jul 02:23:31.049 * Background saving started by pid 6805
6805:C 23 Jul 02:23:31.052 * DB saved on disk
6805:C 23 Jul 02:23:31.052 * RDB: 0 MB of memory used by copy-on-write
6211:M 23 Jul 02:23:31.124 * Background saving terminated with success
6211:M 23 Jul 02:23:31.124 * Synchronization with slave 10.229.35.245:10003 succeeded
Comment From: dictcp
closing stale issue