Redis 6.0.3 and 6.0.4 fail test 50 a successful compile.
OS: SUSE SLES12
CC: GNU GCC v9.2.0
% make test
...
[50/54 done]: unit/type/list-3 (102 seconds)
prev sync_full: 2
prev sync_partial_ok: 14
prev sync_partial_err: 2
# Stats
total_connections_received:21
total_commands_processed:310627
instantaneous_ops_per_sec:3
total_net_input_bytes:6528657
total_net_output_bytes:11718777
instantaneous_input_kbps:0.11
instantaneous_output_kbps:2.42
rejected_connections:0
sync_full:2
sync_partial_ok:14
sync_partial_err:2
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:15
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:446
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
Master ID is 0
0: sync_full: 2
0: id1 : 9da28a301ca2e90cc4824f3f7e05aadea7da9a7b:6515013
0: id2 : 47473f20c1e242727fb17c570d2df8247f4c5e75:5374341
0: backlog : firstbyte=5466438 len=1048576
0: x var is : 310209
---
[ok]: Test replication partial resync: backlog expired (diskless: no, swapdb, reconnect: 1)
Logged warnings (pid 13352):
(none)
Logged warnings (pid 13308):
(none)
Logged warnings (pid 13230):
(none)
Logged warnings (pid 13146):
(none)
Logged warnings (pid 13071):
(none)
[exception]: Executing test client: couldn't open socket: connection refused.
couldn't open socket: connection refused
while executing
"{*}$r info"
(procedure "status" line 2)
invoked from within
"status $R($j) sync_full"
("uplevel" body line 11)
invoked from within
"uplevel 1 {
# The following is the regexp we use to match the log line
# time info. Logs are in the following form:
#
..."
(procedure "show_cluster_status" line 2)
invoked from within
"show_cluster_status"
("uplevel" body line 6)
invoked from within
"uplevel 1 $elsescript"
(procedure "wait_for_condition" line 12)
invoked from within
"wait_for_condition 50 1000 {
[status $R($master_id) sync_partial_ok] == $sync_partial + 1
} else {
puts "prev sync_ful..."
("uplevel" body line 13)
invoked from within
"uplevel 1 $code"
(procedure "test" line 47)
invoked from within
"test "PSYNC2: Partial resync after restart using RDB aux fields" {
# Pick a random slave
set slave_id [expr {($master_id+1)%5}]
..."
("uplevel" body line 198)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
set counter_value..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # T..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clo..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {tags {"psync2"}} {
start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Curre..."
(file "tests/integration/psync2.tcl" line 73)
invoked from within
"source $path"
(procedure "execute_test_file" line 4)
invoked from within
"execute_test_file $data"
(procedure "test_client_main" line 10)
invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 12834
Killing still running Redis server 12900
Killing still running Redis server 12964
Killing still running Redis server 13203
Killing still running Redis server 13276
Killing still running Redis server 13308
Killing still running Redis server 13386
Killing still running Redis server 13778
Killing still running Redis server 18052
Killing still running Redis server 18079
Killing still running Redis server 18128
Killing still running Redis server 18177
Killing still running Redis server 18540
Makefile:318: recipe for target 'test' failed
make[1]: *** [test] Error 1
make[1]: Leaving directory '/hnfs/torcfs01n04a/vol/ret_users_rkedward/dev/tools/sles12/redis/redis-6.0.4/src'
Makefile:6: recipe for target 'test' failed
make: *** [test] Error 2
Comment From: oranagra
@rkedward is it consistent?
Can you try running ./runtest --single integration/psync2 then look into tests/tmp to see if any of the log files contains a crash log.
Comment From: antirez
Also (but please only after testing what @oranargra said) please check if later doing a "make distclean; make" fixes the issue.
Comment From: antirez
@oranagra I've the feeling that after a test failure, the test should always do, before the cleanup, a scan of all the /tmp files checking for crash reports, dump them, and don't clean up the tmp dir at all in case something was found.
Comment From: oranagra
@antirez good idea... Maybe also check for "CRITICAL". But anyway, in this case we see a TCL stack trace, so it didn't get to do any cleanup.
Comment From: rkedward
@oranagra the test failures are not consistent. Two days ago I ran the test suite multiple times; each ran after a fresh install. Most of the time it would fail on step 50 but a few times it would make it further but would never complete. Today I tied to replicate the issue. The first attempt went to test 53/54 and then hung. It does seem each of the steps past 50 takes minutes (~200 seconds) to complete.
I killed the process with a ctrl-c and then invoked ./runtest --single integration/psync2 which failed. Here is the output:
% ./runtest --single integration/psync2
Cleanup: may take some time... OK
Starting test server at port 11111
[ready]: 11944
Testing integration/psync2
[ready]: 11946
[ready]: 11947
[ready]: 11952
[ready]: 11951
[ready]: 11949
[ready]: 11943
[ready]: 11953
[ready]: 11945
[ready]: 11954
[ready]: 11948
[ready]: 11955
[ready]: 11950
[ready]: 11958
[ready]: 11956
[ready]: 11957
[ok]: PSYNC2: --- CYCLE 1 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #1 as master
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: Set #2 to replicate from #1
[ok]: PSYNC2: Set #3 to replicate from #1
[ok]: PSYNC2: Set #0 to replicate from #1
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 135011)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 2 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #0 as master
[ok]: PSYNC2: Set #1 to replicate from #0
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: Set #3 to replicate from #1
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 335797)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 3 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #0 as master
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: Set #1 to replicate from #0
[ok]: PSYNC2: Set #4 to replicate from #0
[ok]: PSYNC2: Set #3 to replicate from #0
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 463369)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 4 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #3 as master
[ok]: PSYNC2: Set #4 to replicate from #3
[ok]: PSYNC2: Set #0 to replicate from #4
[ok]: PSYNC2: Set #1 to replicate from #0
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 662364)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 5 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #0 as master
[ok]: PSYNC2: Set #1 to replicate from #0
[ok]: PSYNC2: Set #4 to replicate from #0
[ok]: PSYNC2: Set #2 to replicate from #1
[ok]: PSYNC2: Set #3 to replicate from #4
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 805151)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 6 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #0 as master
[ok]: PSYNC2: Set #3 to replicate from #0
[ok]: PSYNC2: Set #4 to replicate from #0
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: Set #1 to replicate from #4
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 964215)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: Bring the master back again for next test
prev sync_full: 0
prev sync_partial_ok: 22
prev sync_partial_err: 0
# Stats
total_connections_received:28
total_commands_processed:964687
instantaneous_ops_per_sec:3
total_net_input_bytes:20265701
total_net_output_bytes:49600505
instantaneous_input_kbps:0.11
instantaneous_output_kbps:2.40
rejected_connections:0
sync_full:0
sync_partial_ok:22
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:16
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:413
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
Master ID is 0
0: sync_full: 0
0: id1 : 5d91a7caf2cd1f59ead2d4656f3b8f0a3b5fb53e:20249139
0: id2 : c12d8ec429cab70be655c7e8675ef3883821d5fa:13910064
0: backlog : firstbyte=19200564 len=1048576
0: x var is : 964215
---
Logged warnings (pid 12013):
(none)
Logged warnings (pid 12000):
(none)
Logged warnings (pid 11987):
(none)
Logged warnings (pid 11974):
(none)
Logged warnings (pid 11961):
(none)
[exception]: Executing test client: couldn't open socket: connection refused.
couldn't open socket: connection refused
while executing
"{*}$r info"
(procedure "status" line 2)
invoked from within
"status $R($j) sync_full"
("uplevel" body line 11)
invoked from within
"uplevel 1 {
# The following is the regexp we use to match the log line
# time info. Logs are in the following form:
#
..."
(procedure "show_cluster_status" line 2)
invoked from within
"show_cluster_status"
("uplevel" body line 6)
invoked from within
"uplevel 1 $elsescript"
(procedure "wait_for_condition" line 12)
invoked from within
"wait_for_condition 50 1000 {
[status $R($master_id) sync_partial_ok] == $sync_partial + 1
} else {
puts "prev sync_ful..."
("uplevel" body line 13)
invoked from within
"uplevel 1 $code"
(procedure "test" line 47)
invoked from within
"test "PSYNC2: Partial resync after restart using RDB aux fields" {
# Pick a random slave
set slave_id [expr {($master_id+1)%5}]
..."
("uplevel" body line 198)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
set counter_value..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # T..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clo..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {tags {"psync2"}} {
start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Curre..."
(file "tests/integration/psync2.tcl" line 73)
invoked from within
"source $path"
(procedure "execute_test_file" line 4)
invoked from within
"execute_test_file $data"
(procedure "test_client_main" line 10)
invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 12000
@antirez, I then did a make distclean; make and then make test. This time if failed at step 50 as originally reported. Here is the tail of the output:
[ok]: Client output buffer soft limit is enforced if time is overreached
[50/54 done]: unit/obuf-limits (83 seconds)
prev sync_full: 0
prev sync_partial_ok: 5
prev sync_partial_err: 0
# Stats
total_connections_received:7
total_commands_processed:305040
instantaneous_ops_per_sec:3
total_net_input_bytes:6408952
total_net_output_bytes:2425523
instantaneous_input_kbps:0.13
instantaneous_output_kbps:2.42
rejected_connections:0
sync_full:0
sync_partial_ok:5
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:13
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:519
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
Master ID is 2
0: sync_full: 1
0: id1 : 6ab5f232018954fc0614da2ac120f7026f194305:6399699
0: id2 : 91c901f5a8e38f6ef244f40398bdf80e7d1f4dbf:4946050
0: backlog : firstbyte=5351124 len=1048576
0: x var is : 304715
---
1: sync_full: 0
1: id1 : 6ab5f232018954fc0614da2ac120f7026f194305:6399699
1: id2 : 91c901f5a8e38f6ef244f40398bdf80e7d1f4dbf:4946050
1: backlog : firstbyte=5351124 len=1048576
1: x var is : 304715
---
2: sync_full: 0
2: id1 : 6ab5f232018954fc0614da2ac120f7026f194305:6399699
2: id2 : 91c901f5a8e38f6ef244f40398bdf80e7d1f4dbf:4946050
2: backlog : firstbyte=5351124 len=1048576
2: x var is : 304715
---
Logged warnings (pid 26183):
(none)
Logged warnings (pid 26121):
(none)
Logged warnings (pid 26071):
(none)
Logged warnings (pid 26030):
(none)
Logged warnings (pid 25987):
(none)
[exception]: Executing test client: couldn't open socket: connection refused.
couldn't open socket: connection refused
while executing
"{*}$r info"
(procedure "status" line 2)
invoked from within
"status $R($j) sync_full"
("uplevel" body line 11)
invoked from within
"uplevel 1 {
# The following is the regexp we use to match the log line
# time info. Logs are in the following form:
#
..."
(procedure "show_cluster_status" line 2)
invoked from within
"show_cluster_status"
("uplevel" body line 6)
invoked from within
"uplevel 1 $elsescript"
(procedure "wait_for_condition" line 12)
invoked from within
"wait_for_condition 50 1000 {
[status $R($master_id) sync_partial_ok] == $sync_partial + 1
} else {
puts "prev sync_ful..."
("uplevel" body line 13)
invoked from within
"uplevel 1 $code"
(procedure "test" line 47)
invoked from within
"test "PSYNC2: Partial resync after restart using RDB aux fields" {
# Pick a random slave
set slave_id [expr {($master_id+1)%5}]
..."
("uplevel" body line 198)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
set counter_value..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # T..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clo..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {tags {"psync2"}} {
start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Curre..."
(file "tests/integration/psync2.tcl" line 73)
invoked from within
"source $path"
(procedure "execute_test_file" line 4)
invoked from within
"execute_test_file $data"
(procedure "test_client_main" line 10)
invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 25715
Killing still running Redis server 25739
Killing still running Redis server 25775
Killing still running Redis server 26013
Killing still running Redis server 26030
Killing still running Redis server 26052
Killing still running Redis server 26185
Killing still running Redis server 26619
Killing still running Redis server 30239
Killing still running Redis server 30272
Killing still running Redis server 30317
Killing still running Redis server 30350
Killing still running Redis server 32036
Killing still running Redis server 32091
I/O error reading reply
while executing
"{*}$r type $k"
(procedure "createComplexDataset" line 43)
invoked from within
"createComplexDataset $r $ops"
(procedure "bg_complex_data" line 4)
invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
(file "tests/helpers/bg_complex_data.tcl" line 12)
I/O error reading reply
while executing
"{*}$r rpush $k $v"
("uplevel" body line 1)
invoked from within
"uplevel 1 [lindex $args $path]"
(procedure "randpath" line 3)
invoked from within
"randpath {{*}$r lpush $k $v} {{*}$r rpush $k $v} {{*}$r lrem $k 0 $v} {{*}$r rpop $k} {{*}$r lpop $k}"
(procedure "createComplexDataset" line 51)
invoked from within
"createComplexDataset $r $ops"
(procedure "bg_complex_data" line 4)
invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
(file "tests/helpers/bg_complex_data.tcl" line 12)
I/O error reading reply
while executing
"{*}$r type $k"
(procedure "createComplexDataset" line 27)
invoked from within
"createComplexDataset $r $ops"
(procedure "bg_complex_data" line 4)
invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
(file "tests/helpers/bg_complex_data.tcl" line 12)
Makefile:318: recipe for target 'test' failed
make[1]: *** [test] Error 1
make[1]: Leaving directory '/hnfs/torcfs01n04a/vol/ret_users_rkedward/dev/tools/sles12/redis/redis-6.0.4/src'
Makefile:6: recipe for target 'test' failed
make: *** [test] Error 2
I then ran ./runtest --single integration/psync2 and all tests passed:
% ./runtest --single integration/psync2
Cleanup: may take some time... OK
Starting test server at port 11111
[ready]: 2029
Testing integration/psync2
[ready]: 2031
[ready]: 2033
[ready]: 2030
[ready]: 2034
[ready]: 2035
[ready]: 2032
[ready]: 2037
[ready]: 2036
[ready]: 2040
[ready]: 2038
[ready]: 2039
[ready]: 2041
[ready]: 2042
[ready]: 2043
[ready]: 2044
[ok]: PSYNC2: --- CYCLE 1 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #1 as master
[ok]: PSYNC2: Set #0 to replicate from #1
[ok]: PSYNC2: Set #4 to replicate from #0
[ok]: PSYNC2: Set #3 to replicate from #4
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 167128)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 2 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #2 as master
[ok]: PSYNC2: Set #4 to replicate from #2
[ok]: PSYNC2: Set #3 to replicate from #4
[ok]: PSYNC2: Set #1 to replicate from #2
[ok]: PSYNC2: Set #0 to replicate from #2
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 313741)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 3 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #2 as master
[ok]: PSYNC2: Set #1 to replicate from #2
[ok]: PSYNC2: Set #4 to replicate from #2
[ok]: PSYNC2: Set #0 to replicate from #4
[ok]: PSYNC2: Set #3 to replicate from #0
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 483486)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 4 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #1 as master
[ok]: PSYNC2: Set #3 to replicate from #1
[ok]: PSYNC2: Set #2 to replicate from #1
[ok]: PSYNC2: Set #0 to replicate from #3
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 633675)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 5 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #3 as master
[ok]: PSYNC2: Set #2 to replicate from #3
[ok]: PSYNC2: Set #1 to replicate from #2
[ok]: PSYNC2: Set #4 to replicate from #2
[ok]: PSYNC2: Set #0 to replicate from #2
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 814534)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 6 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #0 as master
[ok]: PSYNC2: Set #1 to replicate from #0
[ok]: PSYNC2: Set #3 to replicate from #0
[ok]: PSYNC2: Set #2 to replicate from #3
[ok]: PSYNC2: Set #4 to replicate from #3
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 966759)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: Bring the master back again for next test
[ok]: PSYNC2: Partial resync after restart using RDB aux fields
[ok]: PSYNC2: Replica RDB restart with EVALSHA in backlog issue #4483
[1/1 done]: integration/psync2 (47 seconds)
The End
Execution time of different units:
47 seconds - integration/psync2
\o/ All tests passed without errors!
Cleanup: may take some time... OK
I repeated the cycle and this time if failed at 49/54. Here is the tail.
49/54 done]: unit/type/list-3 (103 seconds)
[ok]: Slave should be able to synchronize with the master
[ok]: Detect write load to master
prev sync_full: 0
prev sync_partial_ok: 12
prev sync_partial_err: 0
# Stats
total_connections_received:17
total_commands_processed:294201
instantaneous_ops_per_sec:3
total_net_input_bytes:6183520
total_net_output_bytes:8781050
instantaneous_input_kbps:0.11
instantaneous_output_kbps:2.40
rejected_connections:0
sync_full:0
sync_partial_ok:12
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:14
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:406
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
Master ID is 0
0: sync_full: 0
0: id1 : a42a2282454b2a18fe4fa343ce83a5613dfc359c:6170330
0: id2 : 9bdcad33fadf3c5b43a1419857af5efe9629fba8:5117522
0: backlog : firstbyte=5121755 len=1048576
0: x var is : 293794
---
Logged warnings (pid 16477):
(none)
Logged warnings (pid 16451):
(none)
Logged warnings (pid 16401):
(none)
Logged warnings (pid 16339):
(none)
Logged warnings (pid 16297):
(none)
[exception]: Executing test client: couldn't open socket: connection refused.
couldn't open socket: connection refused
while executing
"{*}$r info"
(procedure "status" line 2)
invoked from within
"status $R($j) sync_full"
("uplevel" body line 11)
invoked from within
"uplevel 1 {
# The following is the regexp we use to match the log line
# time info. Logs are in the following form:
#
..."
(procedure "show_cluster_status" line 2)
invoked from within
"show_cluster_status"
("uplevel" body line 6)
invoked from within
"uplevel 1 $elsescript"
(procedure "wait_for_condition" line 12)
invoked from within
"wait_for_condition 50 1000 {
[status $R($master_id) sync_partial_ok] == $sync_partial + 1
} else {
puts "prev sync_ful..."
("uplevel" body line 13)
invoked from within
"uplevel 1 $code"
(procedure "test" line 47)
invoked from within
"test "PSYNC2: Partial resync after restart using RDB aux fields" {
# Pick a random slave
set slave_id [expr {($master_id+1)%5}]
..."
("uplevel" body line 198)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
set counter_value..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # T..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clo..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {tags {"psync2"}} {
start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Curre..."
(file "tests/integration/psync2.tcl" line 73)
invoked from within
"source $path"
(procedure "execute_test_file" line 4)
invoked from within
"execute_test_file $data"
(procedure "test_client_main" line 10)
invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 16051
Killing still running Redis server 16088
Killing still running Redis server 16120
Killing still running Redis server 16353
Killing still running Redis server 16384
Killing still running Redis server 16451
Killing still running Redis server 16502
Killing still running Redis server 17199
Killing still running Redis server 17674
Killing still running Redis server 27725
Killing still running Redis server 27801
Killing still running Redis server 27840
Killing still running Redis server 27892
Killing still running Redis server 29890
Killing still running Redis server 29921
Makefile:318: recipe for target 'test' failed
make[1]: *** [test] Error 1
make[1]: Leaving directory '/hnfs/torcfs01n04a/vol/ret_users_rkedward/dev/tools/sles12/redis/redis-6.0.4/src'
Makefile:6: recipe for target 'test' failed
make: *** [test] Error 2
In this case ./runtest --single integration/psync2 also failed with this output:
% ./runtest --single integration/psync2
Cleanup: may take some time... OK
Starting test server at port 11111
[ready]: 30388
Testing integration/psync2
[ready]: 30395
[ready]: 30392
[ready]: 30391
[ready]: 30393
[ready]: 30387
[ready]: 30394
[ready]: 30397
[ready]: 30389
[ready]: 30390
[ready]: 30396
[ready]: 30398
[ready]: 30400
[ready]: 30401
[ready]: 30399
[ready]: 30402
[ok]: PSYNC2: --- CYCLE 1 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #3 as master
[ok]: PSYNC2: Set #2 to replicate from #3
[ok]: PSYNC2: Set #1 to replicate from #3
[ok]: PSYNC2: Set #4 to replicate from #3
[ok]: PSYNC2: Set #0 to replicate from #4
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 166304)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 2 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #1 as master
[ok]: PSYNC2: Set #0 to replicate from #1
[ok]: PSYNC2: Set #4 to replicate from #0
[ok]: PSYNC2: Set #2 to replicate from #1
[ok]: PSYNC2: Set #3 to replicate from #4
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 326080)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 3 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #3 as master
[ok]: PSYNC2: Set #1 to replicate from #3
[ok]: PSYNC2: Set #0 to replicate from #1
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: Set #2 to replicate from #3
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 492357)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 4 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #4 as master
[ok]: PSYNC2: Set #1 to replicate from #4
[ok]: PSYNC2: Set #3 to replicate from #4
[ok]: PSYNC2: Set #0 to replicate from #1
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 649025)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 5 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #0 as master
[ok]: PSYNC2: Set #1 to replicate from #0
[ok]: PSYNC2: Set #3 to replicate from #0
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 791058)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 6 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #4 as master
[ok]: PSYNC2: Set #0 to replicate from #4
[ok]: PSYNC2: Set #2 to replicate from #4
[ok]: PSYNC2: Set #3 to replicate from #2
[ok]: PSYNC2: Set #1 to replicate from #4
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 934071)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: Bring the master back again for next test
prev sync_full: 1
prev sync_partial_ok: 10
prev sync_partial_err: 1
# Stats
total_connections_received:15
total_commands_processed:934474
instantaneous_ops_per_sec:3
total_net_input_bytes:19629186
total_net_output_bytes:25507136
instantaneous_input_kbps:0.11
instantaneous_output_kbps:2.43
rejected_connections:0
sync_full:1
sync_partial_ok:10
sync_partial_err:1
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:14
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:416
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
Master ID is 4
Logged warnings (pid 30463):
(none)
Logged warnings (pid 30447):
(none)
Logged warnings (pid 30431):
(none)
Logged warnings (pid 30418):
(none)
Logged warnings (pid 30405):
(none)
[exception]: Executing test client: couldn't open socket: connection refused.
couldn't open socket: connection refused
while executing
"{*}$r info"
(procedure "status" line 2)
invoked from within
"status $R($j) sync_full"
("uplevel" body line 11)
invoked from within
"uplevel 1 {
# The following is the regexp we use to match the log line
# time info. Logs are in the following form:
#
..."
(procedure "show_cluster_status" line 2)
invoked from within
"show_cluster_status"
("uplevel" body line 6)
invoked from within
"uplevel 1 $elsescript"
(procedure "wait_for_condition" line 12)
invoked from within
"wait_for_condition 50 1000 {
[status $R($master_id) sync_partial_ok] == $sync_partial + 1
} else {
puts "prev sync_ful..."
("uplevel" body line 13)
invoked from within
"uplevel 1 $code"
(procedure "test" line 47)
invoked from within
"test "PSYNC2: Partial resync after restart using RDB aux fields" {
# Pick a random slave
set slave_id [expr {($master_id+1)%5}]
..."
("uplevel" body line 198)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
set counter_value..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # T..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clo..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {tags {"psync2"}} {
start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Curre..."
(file "tests/integration/psync2.tcl" line 73)
invoked from within
"source $path"
(procedure "execute_test_file" line 4)
invoked from within
"execute_test_file $data"
(procedure "test_client_main" line 10)
invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 30463
Comment From: rkedward
I looked under test/tmp/server/XX.Y and here is the output of one of the stdout. All of them have a line that it lost connection with the master:
30405:C 31 May 2020 16:35:38.185 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
30405:C 31 May 2020 16:35:38.185 # Redis version=6.0.4, bits=64, commit=00000000, modified=0, pid=30405, just started
30405:C 31 May 2020 16:35:38.185 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.4 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 21611
| `-._ `._ / _.-' | PID: 30405
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
30405:M 31 May 2020 16:35:38.186 # Server initialized
30405:M 31 May 2020 16:35:38.186 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
30405:M 31 May 2020 16:35:38.186 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
30405:M 31 May 2020 16:35:38.187 * Ready to accept connections
30405:M 31 May 2020 16:35:38.187 * The server is now ready to accept connections at /hnfs/torcfs01n04a/vol/ret_users_rkedward/dev/tools/sles12/redis/redis-6.0.4/tests/tmp/server.30388.1/socket
30405:M 31 May 2020 16:35:38.399 - Accepted 127.0.0.1:37905
30405:M 31 May 2020 16:35:38.399 - Client closed connection
30405:M 31 May 2020 16:35:38.403 - Accepted 127.0.0.1:39507
30405:S 31 May 2020 16:35:39.316 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
30405:S 31 May 2020 16:35:39.316 * REPLICAOF 127.0.0.1:21612 enabled (user request from 'id=5 addr=127.0.0.1:39507 fd=11 name= age=1 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
30405:S 31 May 2020 16:35:40.103 - Accepted 127.0.0.1:42681
30405:S 31 May 2020 16:35:40.103 - Client closed connection
30405:S 31 May 2020 16:35:40.191 * Connecting to MASTER 127.0.0.1:21612
30405:S 31 May 2020 16:35:40.191 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:35:40.191 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:35:40.191 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:35:40.191 * Trying a partial resynchronization (request 1aa621c48aa89dbc334fadfd87cc8777adf364d2:1).
30405:S 31 May 2020 16:35:40.192 * Full resync from master: e14053b053ebaa389a401f5c481e79046ae72fec:0
30405:S 31 May 2020 16:35:40.192 * Discarding previously cached master state.
30405:S 31 May 2020 16:35:40.219 * MASTER <-> REPLICA sync: receiving 185 bytes from master to disk
30405:S 31 May 2020 16:35:40.419 * MASTER <-> REPLICA sync: Flushing old data
30405:S 31 May 2020 16:35:40.419 * MASTER <-> REPLICA sync: Loading DB in memory
30405:S 31 May 2020 16:35:40.421 * Loading RDB produced by version 6.0.4
30405:S 31 May 2020 16:35:40.421 * RDB age 0 seconds
30405:S 31 May 2020 16:35:40.421 * RDB memory usage when created 1.89 Mb
30405:S 31 May 2020 16:35:40.421 * MASTER <-> REPLICA sync: Finished with success
30405:S 31 May 2020 16:35:41.106 - Accepted 127.0.0.1:45503
30405:S 31 May 2020 16:35:41.106 * Replica 127.0.0.1:21615 asks for synchronization
30405:S 31 May 2020 16:35:41.106 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '755cf8a4bb3db36f134cb1fd50a811a807765e87', my replication IDs are 'e14053b053ebaa389a401f5c481e79046ae72fec' and '0000000000000000000000000000000000000000')
30405:S 31 May 2020 16:35:41.106 * Starting BGSAVE for SYNC with target: disk
30405:S 31 May 2020 16:35:41.107 * Background saving started by pid 30480
30480:C 31 May 2020 16:35:41.109 * DB saved on disk
30480:C 31 May 2020 16:35:41.109 * RDB: 6 MB of memory used by copy-on-write
30405:S 31 May 2020 16:35:41.193 * Background saving terminated with success
30405:S 31 May 2020 16:35:41.194 * Synchronization with replica 127.0.0.1:21615 succeeded
30405:S 31 May 2020 16:35:43.196 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:S 31 May 2020 16:35:47.324 - Client closed connection
30405:S 31 May 2020 16:35:47.324 # Connection with replica 127.0.0.1:21615 lost.
30405:S 31 May 2020 16:35:47.325 # Connection with master lost.
30405:S 31 May 2020 16:35:47.325 * Caching the disconnected master state.
30405:S 31 May 2020 16:35:47.325 * REPLICAOF 127.0.0.1:21615 enabled (user request from 'id=5 addr=127.0.0.1:39507 fd=11 name= age=9 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
30405:S 31 May 2020 16:35:47.423 - Accepted 127.0.0.1:46183
30405:S 31 May 2020 16:35:47.424 - Client closed connection
30405:S 31 May 2020 16:35:48.197 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:S 31 May 2020 16:35:48.197 * Connecting to MASTER 127.0.0.1:21615
30405:S 31 May 2020 16:35:48.197 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:35:48.197 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:35:48.198 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:35:48.198 * Trying a partial resynchronization (request e14053b053ebaa389a401f5c481e79046ae72fec:3492506).
30405:S 31 May 2020 16:35:48.198 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:35:48.198 # Master replication ID changed to e5caedb8cf69deae8faa799c6bf4ade169748da5
30405:S 31 May 2020 16:35:48.198 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:35:48.426 - Accepted 127.0.0.1:46701
30405:S 31 May 2020 16:35:48.426 * Replica 127.0.0.1:21612 asks for synchronization
30405:S 31 May 2020 16:35:48.426 * Partial resynchronization request from 127.0.0.1:21612 accepted. Sending 14 bytes of backlog starting from offset 3492506.
30405:S 31 May 2020 16:35:50.329 # Connection with master lost.
30405:S 31 May 2020 16:35:50.329 * Caching the disconnected master state.
30405:S 31 May 2020 16:35:51.200 * Connecting to MASTER 127.0.0.1:21615
30405:S 31 May 2020 16:35:51.200 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:35:51.200 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:35:51.200 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:35:51.200 * Trying a partial resynchronization (request e5caedb8cf69deae8faa799c6bf4ade169748da5:4181266).
30405:S 31 May 2020 16:35:51.200 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:35:51.200 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:35:52.329 - Client closed connection
30405:S 31 May 2020 16:35:52.329 # Connection with replica 127.0.0.1:21612 lost.
30405:S 31 May 2020 16:35:52.430 - Accepted 127.0.0.1:34893
30405:S 31 May 2020 16:35:52.430 * Replica 127.0.0.1:21612 asks for synchronization
30405:S 31 May 2020 16:35:52.430 * Partial resynchronization request from 127.0.0.1:21612 accepted. Sending 66549 bytes of backlog starting from offset 5541149.
30405:S 31 May 2020 16:35:53.201 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:S 31 May 2020 16:35:53.329 # Connection with master lost.
30405:S 31 May 2020 16:35:53.329 * Caching the disconnected master state.
30405:S 31 May 2020 16:35:54.202 * Connecting to MASTER 127.0.0.1:21615
30405:S 31 May 2020 16:35:54.202 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:35:54.202 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:35:54.202 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:35:54.202 * Trying a partial resynchronization (request e5caedb8cf69deae8faa799c6bf4ade169748da5:6189475).
30405:S 31 May 2020 16:35:54.202 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:35:54.202 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:35:54.331 - Client closed connection
30405:S 31 May 2020 16:35:54.331 # Connection with replica 127.0.0.1:21612 lost.
30405:S 31 May 2020 16:35:54.332 # Connection with master lost.
30405:S 31 May 2020 16:35:54.332 * Caching the disconnected master state.
30405:S 31 May 2020 16:35:54.332 * REPLICAOF 127.0.0.1:21614 enabled (user request from 'id=5 addr=127.0.0.1:39507 fd=11 name= age=16 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
30405:S 31 May 2020 16:35:55.204 * Connecting to MASTER 127.0.0.1:21614
30405:S 31 May 2020 16:35:55.204 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:35:55.204 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:35:55.204 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:35:55.204 * Trying a partial resynchronization (request e5caedb8cf69deae8faa799c6bf4ade169748da5:6847923).
30405:S 31 May 2020 16:35:55.204 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:35:55.204 # Master replication ID changed to f1ee54626f73ec44c7536823c0568d06ac326945
30405:S 31 May 2020 16:35:55.204 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:35:58.206 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:00.337 # Connection with master lost.
30405:M 31 May 2020 16:36:00.337 * Caching the disconnected master state.
30405:M 31 May 2020 16:36:00.337 * Discarding previously cached master state.
30405:M 31 May 2020 16:36:00.337 # Setting secondary replication ID to f1ee54626f73ec44c7536823c0568d06ac326945, valid up to offset: 10339833. New replication ID is 0cf8d0c564c0b946c97c8cafb3687c34628b759c
30405:M 31 May 2020 16:36:00.337 * MASTER MODE enabled (user request from 'id=5 addr=127.0.0.1:39507 fd=11 name= age=22 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=34 qbuf-free=32734 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
30405:M 31 May 2020 16:36:00.435 - Accepted 127.0.0.1:40315
30405:M 31 May 2020 16:36:00.436 * Replica 127.0.0.1:21612 asks for synchronization
30405:M 31 May 2020 16:36:00.436 * Partial resynchronization request from 127.0.0.1:21612 accepted. Sending 0 bytes of backlog starting from offset 10339833.
30405:M 31 May 2020 16:36:00.906 - Accepted 127.0.0.1:39745
30405:M 31 May 2020 16:36:00.907 * Replica 127.0.0.1:21614 asks for synchronization
30405:M 31 May 2020 16:36:00.907 * Partial resynchronization request from 127.0.0.1:21614 accepted. Sending 0 bytes of backlog starting from offset 10339833.
30405:M 31 May 2020 16:36:03.214 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:08.214 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:08.345 - Client closed connection
30405:M 31 May 2020 16:36:08.345 # Connection with replica 127.0.0.1:21614 lost.
30405:M 31 May 2020 16:36:08.345 - Client closed connection
30405:M 31 May 2020 16:36:08.345 # Connection with replica 127.0.0.1:21612 lost.
30405:S 31 May 2020 16:36:08.345 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
30405:S 31 May 2020 16:36:08.345 * REPLICAOF 127.0.0.1:21614 enabled (user request from 'id=5 addr=127.0.0.1:39507 fd=11 name= age=30 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
30405:S 31 May 2020 16:36:09.216 * Connecting to MASTER 127.0.0.1:21614
30405:S 31 May 2020 16:36:09.216 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:36:09.216 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:36:09.216 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:36:09.217 * Trying a partial resynchronization (request 0cf8d0c564c0b946c97c8cafb3687c34628b759c:13629996).
30405:S 31 May 2020 16:36:09.217 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:36:09.217 # Master replication ID changed to 5529be380465be590fe2e8e0e1580762b5c0e975
30405:S 31 May 2020 16:36:09.217 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:36:10.348 # Connection with master lost.
30405:S 31 May 2020 16:36:10.348 * Caching the disconnected master state.
30405:S 31 May 2020 16:36:11.217 * Connecting to MASTER 127.0.0.1:21614
30405:S 31 May 2020 16:36:11.217 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:36:11.217 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:36:11.217 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:36:11.217 * Trying a partial resynchronization (request 5529be380465be590fe2e8e0e1580762b5c0e975:14205825).
30405:S 31 May 2020 16:36:11.217 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:36:11.217 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:36:12.348 # Connection with master lost.
30405:S 31 May 2020 16:36:12.348 * Caching the disconnected master state.
30405:S 31 May 2020 16:36:13.224 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:S 31 May 2020 16:36:13.224 * Connecting to MASTER 127.0.0.1:21614
30405:S 31 May 2020 16:36:13.224 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:36:13.224 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:36:13.224 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:36:13.224 * Trying a partial resynchronization (request 5529be380465be590fe2e8e0e1580762b5c0e975:15423979).
30405:S 31 May 2020 16:36:13.224 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:36:13.224 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:36:13.348 # Connection with master lost.
30405:S 31 May 2020 16:36:13.348 * Caching the disconnected master state.
30405:S 31 May 2020 16:36:14.225 * Connecting to MASTER 127.0.0.1:21614
30405:S 31 May 2020 16:36:14.225 * MASTER <-> REPLICA sync started
30405:S 31 May 2020 16:36:14.225 * Non blocking connect for SYNC fired the event.
30405:S 31 May 2020 16:36:14.225 * Master replied to PING, replication can continue...
30405:S 31 May 2020 16:36:14.225 * Trying a partial resynchronization (request 5529be380465be590fe2e8e0e1580762b5c0e975:16014513).
30405:S 31 May 2020 16:36:14.225 * Successful partial resynchronization with master.
30405:S 31 May 2020 16:36:14.225 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
30405:S 31 May 2020 16:36:14.348 # Connection with master lost.
30405:S 31 May 2020 16:36:14.348 * Caching the disconnected master state.
30405:M 31 May 2020 16:36:14.350 * Discarding previously cached master state.
30405:M 31 May 2020 16:36:14.350 # Setting secondary replication ID to 5529be380465be590fe2e8e0e1580762b5c0e975, valid up to offset: 16612796. New replication ID is a339d7a345367a1018ed958acf0be105f66b6ec5
30405:M 31 May 2020 16:36:14.350 * MASTER MODE enabled (user request from 'id=5 addr=127.0.0.1:39507 fd=11 name= age=36 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=34 qbuf-free=32734 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
30405:M 31 May 2020 16:36:14.680 - Accepted 127.0.0.1:44903
30405:M 31 May 2020 16:36:14.680 * Replica 127.0.0.1:21613 asks for synchronization
30405:M 31 May 2020 16:36:14.680 * Partial resynchronization request from 127.0.0.1:21613 accepted. Sending 0 bytes of backlog starting from offset 16612796.
30405:M 31 May 2020 16:36:14.916 - Accepted 127.0.0.1:35331
30405:M 31 May 2020 16:36:14.916 * Replica 127.0.0.1:21614 asks for synchronization
30405:M 31 May 2020 16:36:14.916 * Partial resynchronization request from 127.0.0.1:21614 accepted. Sending 0 bytes of backlog starting from offset 16612796.
30405:M 31 May 2020 16:36:15.142 - Accepted 127.0.0.1:36717
30405:M 31 May 2020 16:36:15.143 * Replica 127.0.0.1:21615 asks for synchronization
30405:M 31 May 2020 16:36:15.143 * Partial resynchronization request from 127.0.0.1:21615 accepted. Sending 0 bytes of backlog starting from offset 16612796.
30405:M 31 May 2020 16:36:17.356 - Client closed connection
30405:M 31 May 2020 16:36:17.356 # Connection with replica 127.0.0.1:21615 lost.
30405:M 31 May 2020 16:36:18.144 - Accepted 127.0.0.1:46601
30405:M 31 May 2020 16:36:18.144 * Replica 127.0.0.1:21615 asks for synchronization
30405:M 31 May 2020 16:36:18.144 * Partial resynchronization request from 127.0.0.1:21615 accepted. Sending 499275 bytes of backlog starting from offset 17157034.
30405:M 31 May 2020 16:36:18.230 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:18.356 - Client closed connection
30405:M 31 May 2020 16:36:18.356 # Connection with replica 127.0.0.1:21615 lost.
30405:M 31 May 2020 16:36:19.145 - Accepted 127.0.0.1:45351
30405:M 31 May 2020 16:36:19.145 * Replica 127.0.0.1:21615 asks for synchronization
30405:M 31 May 2020 16:36:19.145 * Partial resynchronization request from 127.0.0.1:21615 accepted. Sending 510930 bytes of backlog starting from offset 17784381.
30405:M 31 May 2020 16:36:21.448 - Accepted 127.0.0.1:44431
30405:M 31 May 2020 16:36:21.448 * Replica 127.0.0.1:21612 asks for synchronization
30405:M 31 May 2020 16:36:21.448 * Partial resynchronization request from 127.0.0.1:21612 accepted. Sending 0 bytes of backlog starting from offset 19616190.
30405:M 31 May 2020 16:36:22.368 - Client closed connection
30405:M 31 May 2020 16:36:22.368 # Connection with replica 127.0.0.1:21615 lost.
30405:M 31 May 2020 16:36:23.236 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:28.246 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:33.254 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:38.263 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:42.070 * 10000 changes in 60 seconds. Saving...
30405:M 31 May 2020 16:36:42.070 * Background saving started by pid 30631
30631:C 31 May 2020 16:36:42.074 * DB saved on disk
30631:C 31 May 2020 16:36:42.074 * RDB: 4 MB of memory used by copy-on-write
30405:M 31 May 2020 16:36:42.171 * Background saving terminated with success
30405:M 31 May 2020 16:36:43.273 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:48.284 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:53.293 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:36:58.306 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:37:03.315 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:37:08.322 - DB 9: 1 keys (0 volatile) in 4 slots HT.
30405:M 31 May 2020 16:37:12.542 - Client closed connection
30405:M 31 May 2020 16:37:12.542 # Connection with replica 127.0.0.1:21614 lost.
30405:M 31 May 2020 16:37:12.610 - Client closed connection
30405:M 31 May 2020 16:37:12.610 # Connection with replica 127.0.0.1:21613 lost.
30405:M 31 May 2020 16:37:12.671 - Client closed connection
30405:M 31 May 2020 16:37:12.671 # Connection with replica 127.0.0.1:21612 lost.
30405:signal-handler (1590968232) Received SIGTERM scheduling shutdown...
30405:M 31 May 2020 16:37:12.729 # User requested shutdown...
30405:M 31 May 2020 16:37:12.729 * Saving the final RDB snapshot before exiting.
30405:M 31 May 2020 16:37:12.731 * DB saved on disk
30405:M 31 May 2020 16:37:12.731 * Removing the pid file.
30405:M 31 May 2020 16:37:12.731 * Removing the unix socket file.
30405:M 31 May 2020 16:37:12.731 # Redis is now ready to exit, bye bye...
Comment From: rkedward
My suspect the issue is related to the testing on a nfs network volume. I was able to get the test to complete successfully on a local volume and on a different volume. I also had to use a newer version of tcl.
Comment From: oranagra
@rkedward all these failures are the exact same one (maybe with the exception of the one that hung).
the reason it looks like it fails on a different test number each time is because multiple tests run in multiple threads in parallel, so sometimes more get to complete before it fails.
if you want to run them sequentially in one thread you can do ./runtest --clients 1.
but anyway, you did manage to reproduce it by using ./runtest --single integration/psync please do it again, and then check the logs that are left in tests/tmp at the end of that failed run.
Comment From: oranagra
btw, it may be that running just that single thread will pass since when it runs alone there's less stress on the machine, in which case you can run it with ./runtest --single integration/psync --loop --stop --clients 10 to have 10 of them running in parallel. but then it's harder to know which logs to look at, so if it fails when it runs alone just one test, that's better.
Comment From: rkedward
@oranagra I was able to reproduce using ./runtest --single integration/psync2. I am assuming the log file is tests/tmp/server.10964.X/stdout. Let me know if there it's a different file. stderr and socke have not content and dump.rdb was binary. Each of the stdout files look similiar. Here is the content of tests/tmp/server.10964.9/stdout:
11035:C 01 Jun 2020 07:14:29.369 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
11035:C 01 Jun 2020 07:14:29.369 # Redis version=6.0.4, bits=64, commit=00000000, modified=0, pid=11035, just started
11035:C 01 Jun 2020 07:14:29.369 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.4 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 21615
| `-._ `._ / _.-' | PID: 11035
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
11035:M 01 Jun 2020 07:14:29.370 # Server initialized
11035:M 01 Jun 2020 07:14:29.371 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
11035:M 01 Jun 2020 07:14:29.371 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
11035:M 01 Jun 2020 07:14:29.371 * Ready to accept connections
11035:M 01 Jun 2020 07:14:29.371 * The server is now ready to accept connections at /hnfs/torcfs01n04a/vol/ret_users_rkedward/dev/tools/sles12/redis/redis-6.0.4/tests/tmp/server.10964.9/socket
11035:M 01 Jun 2020 07:14:29.583 - Accepted 127.0.0.1:34427
11035:M 01 Jun 2020 07:14:29.583 - Client closed connection
11035:M 01 Jun 2020 07:14:29.586 - Accepted 127.0.0.1:37249
11035:S 01 Jun 2020 07:14:29.587 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
11035:S 01 Jun 2020 07:14:29.587 * REPLICAOF 127.0.0.1:21613 enabled (user request from 'id=5 addr=127.0.0.1:37249 fd=11 name= age=0 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
11035:S 01 Jun 2020 07:14:30.150 - Accepted 127.0.0.1:36681
11035:S 01 Jun 2020 07:14:30.150 - Client closed connection
11035:S 01 Jun 2020 07:14:30.373 * Connecting to MASTER 127.0.0.1:21613
11035:S 01 Jun 2020 07:14:30.373 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:30.373 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:30.373 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:30.373 * Trying a partial resynchronization (request 959c9d9e9df1d9f9d21c361b3dc09f77820f5207:1).
11035:S 01 Jun 2020 07:14:30.374 * Full resync from master: 9d88b9bc648ad3918f91bbabbaa2e75c9e7f7542:14
11035:S 01 Jun 2020 07:14:30.374 * Discarding previously cached master state.
11035:S 01 Jun 2020 07:14:30.418 * MASTER <-> REPLICA sync: receiving 185 bytes from master to disk
11035:S 01 Jun 2020 07:14:30.626 * MASTER <-> REPLICA sync: Flushing old data
11035:S 01 Jun 2020 07:14:30.626 * MASTER <-> REPLICA sync: Loading DB in memory
11035:S 01 Jun 2020 07:14:30.627 * Loading RDB produced by version 6.0.4
11035:S 01 Jun 2020 07:14:30.627 * RDB age 0 seconds
11035:S 01 Jun 2020 07:14:30.627 * RDB memory usage when created 1.87 Mb
11035:S 01 Jun 2020 07:14:30.627 * MASTER <-> REPLICA sync: Finished with success
11035:S 01 Jun 2020 07:14:31.152 - Accepted 127.0.0.1:42161
11035:S 01 Jun 2020 07:14:31.153 * Replica 127.0.0.1:21614 asks for synchronization
11035:S 01 Jun 2020 07:14:31.153 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '6be4d9682d3573e2fba97cab17c1a52910bf076d', my replication IDs are '9d88b9bc648ad3918f91bbabbaa2e75c9e7f7542' and '0000000000000000000000000000000000000000')
11035:S 01 Jun 2020 07:14:31.153 * Starting BGSAVE for SYNC with target: disk
11035:S 01 Jun 2020 07:14:31.153 * Background saving started by pid 11050
11050:C 01 Jun 2020 07:14:31.155 * DB saved on disk
11050:C 01 Jun 2020 07:14:31.155 * RDB: 6 MB of memory used by copy-on-write
11035:S 01 Jun 2020 07:14:31.175 * Background saving terminated with success
11035:S 01 Jun 2020 07:14:31.176 * Synchronization with replica 127.0.0.1:21614 succeeded
11035:S 01 Jun 2020 07:14:32.592 - Client closed connection
11035:S 01 Jun 2020 07:14:32.592 # Connection with replica 127.0.0.1:21614 lost.
11035:S 01 Jun 2020 07:14:33.154 - Accepted 127.0.0.1:37169
11035:S 01 Jun 2020 07:14:33.154 * Replica 127.0.0.1:21614 asks for synchronization
11035:S 01 Jun 2020 07:14:33.155 * Partial resynchronization request from 127.0.0.1:21614 accepted. Sending 315938 bytes of backlog starting from offset 489219.
11035:S 01 Jun 2020 07:14:33.592 - Client closed connection
11035:S 01 Jun 2020 07:14:33.592 # Connection with replica 127.0.0.1:21614 lost.
11035:S 01 Jun 2020 07:14:34.154 - Accepted 127.0.0.1:45749
11035:S 01 Jun 2020 07:14:34.154 * Replica 127.0.0.1:21614 asks for synchronization
11035:S 01 Jun 2020 07:14:34.155 * Partial resynchronization request from 127.0.0.1:21614 accepted. Sending 316589 bytes of backlog starting from offset 1047476.
11035:S 01 Jun 2020 07:14:34.377 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:14:36.595 - Client closed connection
11035:S 01 Jun 2020 07:14:36.595 # Connection with master lost.
11035:S 01 Jun 2020 07:14:36.595 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:36.595 # Connection with replica 127.0.0.1:21614 lost.
11035:S 01 Jun 2020 07:14:36.595 * REPLICAOF 127.0.0.1:21612 enabled (user request from 'id=5 addr=127.0.0.1:37249 fd=11 name= age=7 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
11035:S 01 Jun 2020 07:14:37.379 * Connecting to MASTER 127.0.0.1:21612
11035:S 01 Jun 2020 07:14:37.379 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:37.379 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:37.379 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:37.379 * Trying a partial resynchronization (request 9d88b9bc648ad3918f91bbabbaa2e75c9e7f7542:2714372).
11035:S 01 Jun 2020 07:14:37.380 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:37.380 # Master replication ID changed to e705a61940aacbccb2cf51c20c1401e53a829bc2
11035:S 01 Jun 2020 07:14:37.380 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:38.598 # Connection with master lost.
11035:S 01 Jun 2020 07:14:38.598 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:39.380 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:14:39.380 * Connecting to MASTER 127.0.0.1:21612
11035:S 01 Jun 2020 07:14:39.380 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:39.380 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:39.380 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:39.380 * Trying a partial resynchronization (request e705a61940aacbccb2cf51c20c1401e53a829bc2:3506081).
11035:S 01 Jun 2020 07:14:39.380 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:39.380 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:39.598 # Connection with master lost.
11035:S 01 Jun 2020 07:14:39.598 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:40.381 * Connecting to MASTER 127.0.0.1:21612
11035:S 01 Jun 2020 07:14:40.381 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:40.381 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:40.381 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:40.381 * Trying a partial resynchronization (request e705a61940aacbccb2cf51c20c1401e53a829bc2:4335973).
11035:S 01 Jun 2020 07:14:40.381 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:40.381 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:40.598 # Connection with master lost.
11035:S 01 Jun 2020 07:14:40.598 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:41.382 * Connecting to MASTER 127.0.0.1:21612
11035:S 01 Jun 2020 07:14:41.382 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:41.382 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:41.382 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:41.382 * Trying a partial resynchronization (request e705a61940aacbccb2cf51c20c1401e53a829bc2:5168721).
11035:S 01 Jun 2020 07:14:41.382 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:41.382 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:41.598 # Connection with master lost.
11035:S 01 Jun 2020 07:14:41.598 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:42.383 * Connecting to MASTER 127.0.0.1:21612
11035:S 01 Jun 2020 07:14:42.383 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:42.383 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:42.383 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:42.383 * Trying a partial resynchronization (request e705a61940aacbccb2cf51c20c1401e53a829bc2:6003653).
11035:S 01 Jun 2020 07:14:42.383 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:42.383 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:42.600 # Connection with master lost.
11035:S 01 Jun 2020 07:14:42.600 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:42.600 * REPLICAOF 127.0.0.1:21611 enabled (user request from 'id=5 addr=127.0.0.1:37249 fd=11 name= age=13 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
11035:S 01 Jun 2020 07:14:43.385 * Connecting to MASTER 127.0.0.1:21611
11035:S 01 Jun 2020 07:14:43.385 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:43.385 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:43.385 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:43.386 * Trying a partial resynchronization (request e705a61940aacbccb2cf51c20c1401e53a829bc2:6837976).
11035:S 01 Jun 2020 07:14:43.386 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
11035:S 01 Jun 2020 07:14:44.386 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:14:44.386 * Connecting to MASTER 127.0.0.1:21611
11035:S 01 Jun 2020 07:14:44.386 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:44.386 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:44.386 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:44.387 * Trying a partial resynchronization (request e705a61940aacbccb2cf51c20c1401e53a829bc2:6837976).
11035:S 01 Jun 2020 07:14:44.387 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:44.387 # Master replication ID changed to 418549b24e5d1ddeaa29fdd67a8ef0060a382132
11035:S 01 Jun 2020 07:14:44.387 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:47.605 # Connection with master lost.
11035:S 01 Jun 2020 07:14:47.605 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:48.387 * Connecting to MASTER 127.0.0.1:21611
11035:S 01 Jun 2020 07:14:48.387 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:48.387 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:48.387 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:48.387 * Trying a partial resynchronization (request 418549b24e5d1ddeaa29fdd67a8ef0060a382132:9039877).
11035:S 01 Jun 2020 07:14:48.387 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:48.387 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:49.387 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:14:49.607 - Client closed connection
11035:S 01 Jun 2020 07:14:49.607 # Connection with master lost.
11035:S 01 Jun 2020 07:14:49.607 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:49.608 * REPLICAOF would result into synchronization with the master we are already connected with. No operation performed.
11035:S 01 Jun 2020 07:14:50.389 * Connecting to MASTER 127.0.0.1:21611
11035:S 01 Jun 2020 07:14:50.389 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:50.389 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:50.389 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:50.389 * Trying a partial resynchronization (request 418549b24e5d1ddeaa29fdd67a8ef0060a382132:10486154).
11035:S 01 Jun 2020 07:14:50.389 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
11035:S 01 Jun 2020 07:14:51.392 * Connecting to MASTER 127.0.0.1:21611
11035:S 01 Jun 2020 07:14:51.392 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:51.392 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:51.392 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:51.392 * Trying a partial resynchronization (request 418549b24e5d1ddeaa29fdd67a8ef0060a382132:10486154).
11035:S 01 Jun 2020 07:14:51.392 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:51.392 # Master replication ID changed to 24f02f7df91d7c4468cd4f124b2cbed961df0590
11035:S 01 Jun 2020 07:14:51.392 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:54.396 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:14:54.614 # Connection with master lost.
11035:S 01 Jun 2020 07:14:54.614 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:55.397 * Connecting to MASTER 127.0.0.1:21611
11035:S 01 Jun 2020 07:14:55.397 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:55.397 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:55.397 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:55.397 * Trying a partial resynchronization (request 24f02f7df91d7c4468cd4f124b2cbed961df0590:11835588).
11035:S 01 Jun 2020 07:14:55.398 * Full resync from master: 24f02f7df91d7c4468cd4f124b2cbed961df0590:13026567
11035:S 01 Jun 2020 07:14:55.398 * Discarding previously cached master state.
11035:S 01 Jun 2020 07:14:55.491 * MASTER <-> REPLICA sync: receiving 191 bytes from master to disk
11035:S 01 Jun 2020 07:14:55.491 * MASTER <-> REPLICA sync: Flushing old data
11035:S 01 Jun 2020 07:14:55.491 * MASTER <-> REPLICA sync: Loading DB in memory
11035:S 01 Jun 2020 07:14:55.493 * Loading RDB produced by version 6.0.4
11035:S 01 Jun 2020 07:14:55.493 * RDB age 0 seconds
11035:S 01 Jun 2020 07:14:55.493 * RDB memory usage when created 1.94 Mb
11035:S 01 Jun 2020 07:14:55.493 * MASTER <-> REPLICA sync: Finished with success
11035:S 01 Jun 2020 07:14:56.619 # Connection with master lost.
11035:S 01 Jun 2020 07:14:56.619 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:57.404 * Connecting to MASTER 127.0.0.1:21611
11035:S 01 Jun 2020 07:14:57.404 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:57.404 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:57.404 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:57.404 * Trying a partial resynchronization (request 24f02f7df91d7c4468cd4f124b2cbed961df0590:13442641).
11035:S 01 Jun 2020 07:14:57.404 * Full resync from master: 24f02f7df91d7c4468cd4f124b2cbed961df0590:14906263
11035:S 01 Jun 2020 07:14:57.404 * Discarding previously cached master state.
11035:S 01 Jun 2020 07:14:57.492 * MASTER <-> REPLICA sync: receiving 191 bytes from master to disk
11035:S 01 Jun 2020 07:14:57.492 * MASTER <-> REPLICA sync: Flushing old data
11035:S 01 Jun 2020 07:14:57.492 * MASTER <-> REPLICA sync: Loading DB in memory
11035:S 01 Jun 2020 07:14:57.494 * Loading RDB produced by version 6.0.4
11035:S 01 Jun 2020 07:14:57.494 * RDB age 0 seconds
11035:S 01 Jun 2020 07:14:57.494 * RDB memory usage when created 1.94 Mb
11035:S 01 Jun 2020 07:14:57.494 * MASTER <-> REPLICA sync: Finished with success
11035:S 01 Jun 2020 07:14:57.642 # Connection with master lost.
11035:S 01 Jun 2020 07:14:57.642 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:14:57.642 * REPLICAOF 127.0.0.1:21612 enabled (user request from 'id=5 addr=127.0.0.1:37249 fd=11 name= age=28 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
11035:S 01 Jun 2020 07:14:58.406 * Connecting to MASTER 127.0.0.1:21612
11035:S 01 Jun 2020 07:14:58.406 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:14:58.406 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:14:58.406 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:14:58.406 * Trying a partial resynchronization (request 24f02f7df91d7c4468cd4f124b2cbed961df0590:15105281).
11035:S 01 Jun 2020 07:14:58.406 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:14:58.406 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:14:59.407 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:02.645 # Connection with master lost.
11035:S 01 Jun 2020 07:15:02.645 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:15:03.407 * Connecting to MASTER 127.0.0.1:21612
11035:S 01 Jun 2020 07:15:03.408 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:15:03.408 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:15:03.408 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:15:03.408 * Trying a partial resynchronization (request 24f02f7df91d7c4468cd4f124b2cbed961df0590:17652252).
11035:S 01 Jun 2020 07:15:03.408 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:15:03.408 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:15:03.675 - Client closed connection
11035:S 01 Jun 2020 07:15:03.675 # Connection with master lost.
11035:S 01 Jun 2020 07:15:03.675 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:15:03.675 * REPLICAOF 127.0.0.1:21613 enabled (user request from 'id=5 addr=127.0.0.1:37249 fd=11 name= age=34 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
11035:S 01 Jun 2020 07:15:04.410 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:04.410 * Connecting to MASTER 127.0.0.1:21613
11035:S 01 Jun 2020 07:15:04.410 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:15:04.410 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:15:04.410 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:15:04.410 * Trying a partial resynchronization (request 24f02f7df91d7c4468cd4f124b2cbed961df0590:18271010).
11035:S 01 Jun 2020 07:15:04.410 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:15:04.410 # Master replication ID changed to 0b5bf72cee6f2f005d0e58498fdbb60aea42f7bf
11035:S 01 Jun 2020 07:15:04.410 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:15:06.678 # Connection with master lost.
11035:S 01 Jun 2020 07:15:06.678 * Caching the disconnected master state.
11035:S 01 Jun 2020 07:15:07.411 * Connecting to MASTER 127.0.0.1:21613
11035:S 01 Jun 2020 07:15:07.411 * MASTER <-> REPLICA sync started
11035:S 01 Jun 2020 07:15:07.411 * Non blocking connect for SYNC fired the event.
11035:S 01 Jun 2020 07:15:07.411 * Master replied to PING, replication can continue...
11035:S 01 Jun 2020 07:15:07.411 * Trying a partial resynchronization (request 0b5bf72cee6f2f005d0e58498fdbb60aea42f7bf:19388989).
11035:S 01 Jun 2020 07:15:07.411 * Successful partial resynchronization with master.
11035:S 01 Jun 2020 07:15:07.411 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
11035:S 01 Jun 2020 07:15:09.412 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:09.711 * REPLICAOF would result into synchronization with the master we are already connected with. No operation performed.
11035:S 01 Jun 2020 07:15:14.423 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:19.435 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:24.447 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:29.460 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:32.065 * 10000 changes in 60 seconds. Saving...
11035:S 01 Jun 2020 07:15:32.066 * Background saving started by pid 11178
11178:C 01 Jun 2020 07:15:32.068 * DB saved on disk
11178:C 01 Jun 2020 07:15:32.069 * RDB: 8 MB of memory used by copy-on-write
11035:S 01 Jun 2020 07:15:32.166 * Background saving terminated with success
11035:S 01 Jun 2020 07:15:34.471 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:39.481 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:44.491 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:49.502 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:54.512 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:S 01 Jun 2020 07:15:59.522 - DB 9: 1 keys (0 volatile) in 4 slots HT.
11035:signal-handler (1591020959) Received SIGTERM scheduling shutdown...
11035:S 01 Jun 2020 07:15:59.823 # User requested shutdown...
11035:S 01 Jun 2020 07:15:59.823 * Saving the final RDB snapshot before exiting.
11035:S 01 Jun 2020 07:15:59.825 * DB saved on disk
11035:S 01 Jun 2020 07:15:59.825 * Removing the pid file.
11035:S 01 Jun 2020 07:15:59.825 * Removing the unix socket file.
11035:S 01 Jun 2020 07:15:59.825 # Redis is now ready to exit, bye bye...
Comment From: oranagra
so there's no crash in the log.. i wonder what's causing that connection refused?
Comment From: oranagra
@rkedward please try to modify tests/integration/psync2.tcl.
there are two places that do:
catch {
$R($slave_id) config rewrite
$R($slave_id) debug restart
}
in both, please add the following after the catch block:
after 1000
then run the test again a few times to see if it helps.
Comment From: oranagra
@rkedward any news?
Comment From: rkedward
Here is the output:
% ./runtest --single integration/psync2
Cleanup: may take some time... OK
Starting test server at port 11111
[ready]: 10028
Testing integration/psync2
[ready]: 10029
[ready]: 10027
[ready]: 10030
[ready]: 10031
[ready]: 10033
[ready]: 10035
[ready]: 10032
[ready]: 10036
[ready]: 10034
[ready]: 10037
[ready]: 10039
[ready]: 10040
[ready]: 10042
[ready]: 10038
[ready]: 10041
[ok]: PSYNC2: --- CYCLE 1 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #2 as master
[ok]: PSYNC2: Set #1 to replicate from #2
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: Set #3 to replicate from #1
[ok]: PSYNC2: Set #0 to replicate from #2
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 142661)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 2 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #1 as master
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: Set #0 to replicate from #4
[ok]: PSYNC2: Set #2 to replicate from #1
[ok]: PSYNC2: Set #3 to replicate from #1
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 268231)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 3 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #1 as master
[ok]: PSYNC2: Set #4 to replicate from #1
[ok]: PSYNC2: Set #2 to replicate from #1
[ok]: PSYNC2: Set #0 to replicate from #2
[ok]: PSYNC2: Set #3 to replicate from #4
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 431418)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 4 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #4 as master
[ok]: PSYNC2: Set #0 to replicate from #4
[ok]: PSYNC2: Set #2 to replicate from #4
[ok]: PSYNC2: Set #3 to replicate from #0
[ok]: PSYNC2: Set #1 to replicate from #2
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 582268)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 5 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #1 as master
[ok]: PSYNC2: Set #0 to replicate from #1
[ok]: PSYNC2: Set #2 to replicate from #0
[ok]: PSYNC2: Set #4 to replicate from #2
[ok]: PSYNC2: Set #3 to replicate from #0
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 735705)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: --- CYCLE 6 ---
[ok]: PSYNC2: [NEW LAYOUT] Set #3 as master
[ok]: PSYNC2: Set #0 to replicate from #3
[ok]: PSYNC2: Set #4 to replicate from #0
[ok]: PSYNC2: Set #2 to replicate from #4
[ok]: PSYNC2: Set #1 to replicate from #3
[ok]: PSYNC2: cluster is consistent after failover
[ok]: PSYNC2: generate load while killing replication links
[ok]: PSYNC2: cluster is consistent after load (x = 891153)
[ok]: PSYNC2: total sum of full synchronizations is exactly 4
[ok]: PSYNC2: Bring the master back again for next test
prev sync_full: 0
prev sync_partial_ok: 6
prev sync_partial_err: 0
# Stats
total_connections_received:8
total_commands_processed:891500
instantaneous_ops_per_sec:4
total_net_input_bytes:18725302
total_net_output_bytes:8281407
instantaneous_input_kbps:0.16
instantaneous_output_kbps:2.40
rejected_connections:0
sync_full:0
sync_partial_ok:6
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:13
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:449
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
Master ID is 3
0: sync_full: 0
0: id1 : fc4537917e43948b61ee568c4403b69e700aaccd:18714860
0: id2 : 4978dd92785f27152708e761b6f4881e9d9527c2:15450346
0: backlog : firstbyte=17666285 len=1048576
0: x var is : 891153
---
1: sync_full: 2
1: id1 : fc4537917e43948b61ee568c4403b69e700aaccd:18714860
1: id2 : 4978dd92785f27152708e761b6f4881e9d9527c2:15450346
1: backlog : firstbyte=17666285 len=1048576
1: x var is : 891153
---
2: sync_full: 2
2: id1 : fc4537917e43948b61ee568c4403b69e700aaccd:18714860
2: id2 : 4978dd92785f27152708e761b6f4881e9d9527c2:15450346
2: backlog : firstbyte=17666285 len=1048576
2: x var is : 891153
---
3: sync_full: 0
3: id1 : fc4537917e43948b61ee568c4403b69e700aaccd:18714860
3: id2 : 4978dd92785f27152708e761b6f4881e9d9527c2:15450346
3: backlog : firstbyte=17666285 len=1048576
3: x var is : 891153
---
Logged warnings (pid 10095):
(none)
Logged warnings (pid 10082):
(none)
Logged warnings (pid 10069):
(none)
Logged warnings (pid 10056):
(none)
Logged warnings (pid 10045):
(none)
[exception]: Executing test client: couldn't open socket: connection refused.
couldn't open socket: connection refused
while executing
"{*}$r info"
(procedure "status" line 2)
invoked from within
"status $R($j) sync_full"
("uplevel" body line 11)
invoked from within
"uplevel 1 {
# The following is the regexp we use to match the log line
# time info. Logs are in the following form:
#
..."
(procedure "show_cluster_status" line 2)
invoked from within
"show_cluster_status"
("uplevel" body line 6)
invoked from within
"uplevel 1 $elsescript"
(procedure "wait_for_condition" line 12)
invoked from within
"wait_for_condition 50 1000 {
[status $R($master_id) sync_partial_ok] == $sync_partial + 1
} else {
puts "prev sync_ful..."
("uplevel" body line 14)
invoked from within
"uplevel 1 $code"
(procedure "test" line 47)
invoked from within
"test "PSYNC2: Partial resync after restart using RDB aux fields" {
# Pick a random slave
set slave_id [expr {($master_id+1)%5}]
..."
("uplevel" body line 198)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
set counter_value..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # Test start time
..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clock seconds] ; # T..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Current master
set start_time [clo..."
("uplevel" body line 2)
invoked from within
"uplevel 1 $code "
(procedure "start_server" line 3)
invoked from within
"start_server {tags {"psync2"}} {
start_server {} {
start_server {} {
start_server {} {
start_server {} {
set master_id 0 ; # Curre..."
(file "tests/integration/psync2.tcl" line 73)
invoked from within
"source $path"
(procedure "execute_test_file" line 4)
invoked from within
"execute_test_file $data"
(procedure "test_client_main" line 10)
invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 10045
All stdouts show that servers shutdown successfully except for server.10028.1 (pid: 10045)
Here is server.10028.1/stdout:
10045:C 07 Jun 2020 23:19:48.968 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10045:C 07 Jun 2020 23:19:48.969 # Redis version=6.0.4, bits=64, commit=00000000, modified=0, pid=10045, just started
10045:C 07 Jun 2020 23:19:48.969 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.4 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 21611
| `-._ `._ / _.-' | PID: 10045
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
10045:M 07 Jun 2020 23:19:48.970 # Server initialized
10045:M 07 Jun 2020 23:19:48.970 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
10045:M 07 Jun 2020 23:19:48.970 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
10045:M 07 Jun 2020 23:19:48.970 * Ready to accept connections
10045:M 07 Jun 2020 23:19:48.970 * The server is now ready to accept connections at /hnfs/torcfs01n04a/vol/ret_users_rkedward/dev/tools/sles12/redis/redis-6.0.4/tests/tmp/server.10028.1/socket
10045:M 07 Jun 2020 23:19:49.088 - Accepted 127.0.0.1:33981
10045:M 07 Jun 2020 23:19:49.088 - Client closed connection
10045:M 07 Jun 2020 23:19:49.091 - Accepted 127.0.0.1:42299
10045:S 07 Jun 2020 23:19:49.995 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
10045:S 07 Jun 2020 23:19:49.995 * REPLICAOF 127.0.0.1:21614 enabled (user request from 'id=5 addr=127.0.0.1:42299 fd=11 name= age=0 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
10045:S 07 Jun 2020 23:19:50.974 * Connecting to MASTER 127.0.0.1:21614
10045:S 07 Jun 2020 23:19:50.974 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:19:50.974 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:19:50.974 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:19:50.974 * Trying a partial resynchronization (request 36af7120ff374baf05fc0544933efcdbd2a3c764:1).
10045:S 07 Jun 2020 23:19:50.975 * Full resync from master: 69efe2f57d51a76ba8377af811c93d16c281c99c:0
10045:S 07 Jun 2020 23:19:50.975 * Discarding previously cached master state.
10045:S 07 Jun 2020 23:19:51.059 * MASTER <-> REPLICA sync: receiving 185 bytes from master to disk
10045:S 07 Jun 2020 23:19:51.270 * MASTER <-> REPLICA sync: Flushing old data
10045:S 07 Jun 2020 23:19:51.270 * MASTER <-> REPLICA sync: Loading DB in memory
10045:S 07 Jun 2020 23:19:51.285 * Loading RDB produced by version 6.0.4
10045:S 07 Jun 2020 23:19:51.285 * RDB age 1 seconds
10045:S 07 Jun 2020 23:19:51.285 * RDB memory usage when created 1.86 Mb
10045:S 07 Jun 2020 23:19:51.285 * MASTER <-> REPLICA sync: Finished with success
10045:S 07 Jun 2020 23:19:53.987 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:S 07 Jun 2020 23:19:54.999 # Connection with master lost.
10045:S 07 Jun 2020 23:19:54.999 * Caching the disconnected master state.
10045:S 07 Jun 2020 23:19:55.988 * Connecting to MASTER 127.0.0.1:21614
10045:S 07 Jun 2020 23:19:55.988 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:19:55.988 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:19:55.988 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:19:55.988 * Trying a partial resynchronization (request 69efe2f57d51a76ba8377af811c93d16c281c99c:1702550).
10045:S 07 Jun 2020 23:19:55.988 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:19:55.988 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:19:57.001 - Client closed connection
10045:S 07 Jun 2020 23:19:57.001 # Connection with master lost.
10045:S 07 Jun 2020 23:19:57.001 * Caching the disconnected master state.
10045:S 07 Jun 2020 23:19:57.001 * REPLICAOF would result into synchronization with the master we are already connected with. No operation performed.
10045:S 07 Jun 2020 23:19:57.787 - Accepted 127.0.0.1:38335
10045:S 07 Jun 2020 23:19:57.787 - Client closed connection
10045:S 07 Jun 2020 23:19:57.991 * Connecting to MASTER 127.0.0.1:21614
10045:S 07 Jun 2020 23:19:57.991 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:19:57.991 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:19:57.991 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:19:57.991 * Trying a partial resynchronization (request 69efe2f57d51a76ba8377af811c93d16c281c99c:2995989).
10045:S 07 Jun 2020 23:19:57.991 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:19:57.991 # Master replication ID changed to 5704b4171be3437550ce0d9ea764a1e19722a48f
10045:S 07 Jun 2020 23:19:57.991 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:19:58.788 - Accepted 127.0.0.1:41203
10045:S 07 Jun 2020 23:19:58.789 * Replica 127.0.0.1:21615 asks for synchronization
10045:S 07 Jun 2020 23:19:58.789 * Partial resynchronization request from 127.0.0.1:21615 accepted. Sending 28 bytes of backlog starting from offset 2995989.
10045:S 07 Jun 2020 23:19:58.993 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:S 07 Jun 2020 23:20:01.006 - Client closed connection
10045:S 07 Jun 2020 23:20:01.006 # Connection with replica 127.0.0.1:21615 lost.
10045:S 07 Jun 2020 23:20:01.789 - Accepted 127.0.0.1:46595
10045:S 07 Jun 2020 23:20:01.789 * Replica 127.0.0.1:21615 asks for synchronization
10045:S 07 Jun 2020 23:20:01.789 * Partial resynchronization request from 127.0.0.1:21615 accepted. Sending 411257 bytes of backlog starting from offset 3904339.
10045:S 07 Jun 2020 23:20:03.006 - Client closed connection
10045:S 07 Jun 2020 23:20:03.006 # Connection with replica 127.0.0.1:21615 lost.
10045:S 07 Jun 2020 23:20:03.790 - Accepted 127.0.0.1:46409
10045:S 07 Jun 2020 23:20:03.790 * Replica 127.0.0.1:21615 asks for synchronization
10045:S 07 Jun 2020 23:20:03.790 * Partial resynchronization request from 127.0.0.1:21615 accepted. Sending 460943 bytes of backlog starting from offset 5043596.
10045:S 07 Jun 2020 23:20:03.993 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:S 07 Jun 2020 23:20:04.008 * REPLICAOF would result into synchronization with the master we are already connected with. No operation performed.
10045:S 07 Jun 2020 23:20:04.008 - Client closed connection
10045:S 07 Jun 2020 23:20:04.008 # Connection with replica 127.0.0.1:21615 lost.
10045:S 07 Jun 2020 23:20:04.113 - Accepted 127.0.0.1:38661
10045:S 07 Jun 2020 23:20:04.113 * Replica 127.0.0.1:21612 asks for synchronization
10045:S 07 Jun 2020 23:20:04.113 * Partial resynchronization request from 127.0.0.1:21612 accepted. Sending 0 bytes of backlog starting from offset 5633080.
10045:S 07 Jun 2020 23:20:06.010 - Client closed connection
10045:S 07 Jun 2020 23:20:06.010 # Connection with replica 127.0.0.1:21612 lost.
10045:S 07 Jun 2020 23:20:06.114 - Accepted 127.0.0.1:36961
10045:S 07 Jun 2020 23:20:06.114 * Replica 127.0.0.1:21612 asks for synchronization
10045:S 07 Jun 2020 23:20:06.114 * Partial resynchronization request from 127.0.0.1:21612 accepted. Sending 85911 bytes of backlog starting from offset 6358994.
10045:S 07 Jun 2020 23:20:08.995 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:S 07 Jun 2020 23:20:10.010 - Client closed connection
10045:S 07 Jun 2020 23:20:10.010 # Connection with replica 127.0.0.1:21612 lost.
10045:M 07 Jun 2020 23:20:10.012 # Connection with master lost.
10045:M 07 Jun 2020 23:20:10.012 * Caching the disconnected master state.
10045:M 07 Jun 2020 23:20:10.012 * Discarding previously cached master state.
10045:M 07 Jun 2020 23:20:10.012 # Setting secondary replication ID to 5704b4171be3437550ce0d9ea764a1e19722a48f, valid up to offset: 9060091. New replication ID is f0f69f64c127a8c675a92a17d2c94aea0c2d240c
10045:M 07 Jun 2020 23:20:10.012 * MASTER MODE enabled (user request from 'id=5 addr=127.0.0.1:42299 fd=11 name= age=21 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=34 qbuf-free=32734 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
10045:M 07 Jun 2020 23:20:10.348 - Accepted 127.0.0.1:45401
10045:M 07 Jun 2020 23:20:10.349 * Replica 127.0.0.1:21613 asks for synchronization
10045:M 07 Jun 2020 23:20:10.349 * Partial resynchronization request from 127.0.0.1:21613 accepted. Sending 0 bytes of backlog starting from offset 9060091.
10045:M 07 Jun 2020 23:20:10.795 - Accepted 127.0.0.1:43567
10045:M 07 Jun 2020 23:20:10.796 * Replica 127.0.0.1:21615 asks for synchronization
10045:M 07 Jun 2020 23:20:10.796 * Partial resynchronization request from 127.0.0.1:21615 accepted. Sending 0 bytes of backlog starting from offset 9060091.
10045:M 07 Jun 2020 23:20:14.000 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:M 07 Jun 2020 23:20:17.019 - Client closed connection
10045:M 07 Jun 2020 23:20:17.019 # Connection with replica 127.0.0.1:21615 lost.
10045:M 07 Jun 2020 23:20:17.019 - Client closed connection
10045:M 07 Jun 2020 23:20:17.019 # Connection with replica 127.0.0.1:21613 lost.
10045:S 07 Jun 2020 23:20:17.019 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
10045:S 07 Jun 2020 23:20:17.019 * REPLICAOF 127.0.0.1:21613 enabled (user request from 'id=5 addr=127.0.0.1:42299 fd=11 name= age=28 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
10045:S 07 Jun 2020 23:20:18.002 * Connecting to MASTER 127.0.0.1:21613
10045:S 07 Jun 2020 23:20:18.003 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:20:18.003 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:20:18.003 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:20:18.003 * Trying a partial resynchronization (request f0f69f64c127a8c675a92a17d2c94aea0c2d240c:12228062).
10045:S 07 Jun 2020 23:20:18.003 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
10045:S 07 Jun 2020 23:20:19.005 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:S 07 Jun 2020 23:20:19.005 * Connecting to MASTER 127.0.0.1:21613
10045:S 07 Jun 2020 23:20:19.005 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:20:19.005 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:20:19.005 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:20:19.006 * Trying a partial resynchronization (request f0f69f64c127a8c675a92a17d2c94aea0c2d240c:12228062).
10045:S 07 Jun 2020 23:20:19.006 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:20:19.006 # Master replication ID changed to 4978dd92785f27152708e761b6f4881e9d9527c2
10045:S 07 Jun 2020 23:20:19.006 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:20:20.023 # Connection with master lost.
10045:S 07 Jun 2020 23:20:20.023 * Caching the disconnected master state.
10045:S 07 Jun 2020 23:20:21.006 * Connecting to MASTER 127.0.0.1:21613
10045:S 07 Jun 2020 23:20:21.006 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:20:21.006 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:20:21.006 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:20:21.006 * Trying a partial resynchronization (request 4978dd92785f27152708e761b6f4881e9d9527c2:12907337).
10045:S 07 Jun 2020 23:20:21.006 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:20:21.006 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:20:24.006 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:S 07 Jun 2020 23:20:24.023 # Connection with master lost.
10045:S 07 Jun 2020 23:20:24.023 * Caching the disconnected master state.
10045:S 07 Jun 2020 23:20:24.026 * REPLICAOF 127.0.0.1:21615 enabled (user request from 'id=5 addr=127.0.0.1:42299 fd=11 name= age=35 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
10045:S 07 Jun 2020 23:20:24.363 - Accepted 127.0.0.1:34517
10045:S 07 Jun 2020 23:20:24.363 - Client closed connection
10045:S 07 Jun 2020 23:20:25.009 * Connecting to MASTER 127.0.0.1:21615
10045:S 07 Jun 2020 23:20:25.009 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:20:25.009 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:20:25.009 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:20:25.009 * Trying a partial resynchronization (request 4978dd92785f27152708e761b6f4881e9d9527c2:15450346).
10045:S 07 Jun 2020 23:20:25.009 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:20:25.009 # Master replication ID changed to fc4537917e43948b61ee568c4403b69e700aaccd
10045:S 07 Jun 2020 23:20:25.009 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:20:25.366 - Accepted 127.0.0.1:43221
10045:S 07 Jun 2020 23:20:25.366 * Replica 127.0.0.1:21613 asks for synchronization
10045:S 07 Jun 2020 23:20:25.366 * Partial resynchronization request from 127.0.0.1:21613 accepted. Sending 14 bytes of backlog starting from offset 15450346.
10045:S 07 Jun 2020 23:20:27.031 # Connection with master lost.
10045:S 07 Jun 2020 23:20:27.031 * Caching the disconnected master state.
10045:S 07 Jun 2020 23:20:28.013 * Connecting to MASTER 127.0.0.1:21615
10045:S 07 Jun 2020 23:20:28.013 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:20:28.013 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:20:28.013 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:20:28.013 * Trying a partial resynchronization (request fc4537917e43948b61ee568c4403b69e700aaccd:16027939).
10045:S 07 Jun 2020 23:20:28.013 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:20:28.013 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:20:29.013 - DB 9: 1 keys (0 volatile) in 4 slots HT.
10045:S 07 Jun 2020 23:20:29.031 # Connection with master lost.
10045:S 07 Jun 2020 23:20:29.031 * Caching the disconnected master state.
10045:S 07 Jun 2020 23:20:30.014 * Connecting to MASTER 127.0.0.1:21615
10045:S 07 Jun 2020 23:20:30.014 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:20:30.014 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:20:30.014 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:20:30.014 * Trying a partial resynchronization (request fc4537917e43948b61ee568c4403b69e700aaccd:17236055).
10045:S 07 Jun 2020 23:20:30.014 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:20:30.014 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:20:31.033 - Client closed connection
10045:S 07 Jun 2020 23:20:31.033 # Connection with replica 127.0.0.1:21613 lost.
10045:S 07 Jun 2020 23:20:31.033 # Connection with master lost.
10045:S 07 Jun 2020 23:20:31.033 * Caching the disconnected master state.
10045:S 07 Jun 2020 23:20:31.033 * REPLICAOF 127.0.0.1:21612 enabled (user request from 'id=5 addr=127.0.0.1:42299 fd=11 name= age=42 idle=0 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=43 qbuf-free=32725 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
10045:S 07 Jun 2020 23:20:32.017 * Connecting to MASTER 127.0.0.1:21612
10045:S 07 Jun 2020 23:20:32.017 * MASTER <-> REPLICA sync started
10045:S 07 Jun 2020 23:20:32.017 * Non blocking connect for SYNC fired the event.
10045:S 07 Jun 2020 23:20:32.017 * Master replied to PING, replication can continue...
10045:S 07 Jun 2020 23:20:32.017 * Trying a partial resynchronization (request fc4537917e43948b61ee568c4403b69e700aaccd:18714861).
10045:S 07 Jun 2020 23:20:32.017 * Successful partial resynchronization with master.
10045:S 07 Jun 2020 23:20:32.017 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
10045:S 07 Jun 2020 23:20:32.037 # CONFIG REWRITE executed with success.
10045:S 07 Jun 2020 23:20:32.038 # User requested shutdown...
10045:S 07 Jun 2020 23:20:32.038 * Saving the final RDB snapshot before exiting.
10045:S 07 Jun 2020 23:20:32.039 * DB saved on disk
10045:S 07 Jun 2020 23:20:32.039 * Removing the pid file.
10045:S 07 Jun 2020 23:20:32.039 * Removing the unix socket file.
10045:S 07 Jun 2020 23:20:32.039 # Redis is now ready to exit, bye bye...
10045:C 07 Jun 2020 23:20:32.046 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10045:C 07 Jun 2020 23:20:32.046 # Redis version=6.0.4, bits=64, commit=00000000, modified=0, pid=10045, just started
10045:C 07 Jun 2020 23:20:32.046 # Configuration loaded
10045:S 07 Jun 2020 23:20:32.047 # Opening Unix socket: bind: Address already in use
Comment From: rkedward
BTW, I'm seeing the same issue with 6.0.5.
Comment From: oranagra
@rkedward the problem is this:
10045:S 07 Jun 2020 23:20:32.047 # Opening Unix socket: bind: Address already in use
i suspect that https://github.com/antirez/redis/pull/7404 may solve it. @antirez please share your thoughts.
Comment From: oranagra
@rkedward maybe you can try to apply the test suite changes in the above mentioned PR and check if it solves that problem.
Comment From: rkedward
FYI, I ran the tests for the latest release v6.0.10 and did not get this failure. This issue seems to be resolved.
Comment From: oranagra
thanks. closing.