- I startup a redis server, then send a "slaveof" command to turn it to a slave. The rdb file size is about 1GB, while the dir disk space is not big enough(281M).
$ redis 127.0.0.1:6379> config get dir 1) "dir" 2) "/home/shenlx/data"
[shenlx@snrsdevapp13 src]$ df -hl Filesystem Size Used Avail Use% Mounted on ... tmpfs 16G 16K 16G 1% /dev/shm /dev/vda1 194M 100M 84M 55% /boot /dev/mapper/systemvg-homelv 2.0G 1.6G 281M 86% /home ...
- So, slave writes to temp file failed. The error is as follows:
10964:S 05 Jun 09:41:55.087 * MASTER <-> SLAVE sync: receiving 413389026 bytes from master 10964:S 05 Jun 09:41:59.129 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: Resource temporarily unavailable 10964:S 05 Jun 09:41:59.988 * Connecting to MASTER 10.245.71.252:6379 10964:S 05 Jun 09:41:59.988 * MASTER <-> SLAVE sync started 10964:S 05 Jun 09:41:59.991 * Non blocking connect for SYNC fired the event. 10964:S 05 Jun 09:41:59.994 * Master replied to PING, replication can continue... 10964:S 05 Jun 09:42:00.000 * (Non critical) Master does not understand REPLCONF capa: -ERR Unrecognized REPLCONF option: capa 10964:S 05 Jun 09:42:00.000 * Partial resynchronization not possible (no cached master) 10964:S 05 Jun 09:42:00.003 * Full resync from master: 511d485419f66d97b9bfc302626fe94fabd2f10c:6550301656 10964:S 05 Jun 09:42:09.106 * MASTER <-> SLAVE sync: receiving 413389026 bytes from master 10964:S 05 Jun 09:42:53.762 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: No space left on device 10964:S 05 Jun 09:42:54.150 * Connecting to MASTER 10.245.71.252:6379 10964:S 05 Jun 09:42:54.150 * MASTER <-> SLAVE sync started 10964:S 05 Jun 09:42:54.153 * Non blocking connect for SYNC fired the event. 10964:S 05 Jun 09:42:54.156 * Master replied to PING, replication can continue... 10964:S 05 Jun 09:42:54.162 * (Non critical) Master does not understand REPLCONF capa: -ERR Unrecognized REPLCONF option: capa 10964:S 05 Jun 09:42:54.162 * Partial resynchronization not possible (no cached master) 10964:S 05 Jun 09:42:54.165 * Full resync from master: 511d485419f66d97b9bfc302626fe94fabd2f10c:6550324985 10964:S 05 Jun 09:43:03.468 * MASTER <-> SLAVE sync: receiving 413389026 bytes from master 10964:S 05 Jun 09:43:29.543 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: Resource temporarily unavailable
- The error logs are confusing and incorrect. The root cause is as follows:
replication.c
1209 if (write(server.repl_transfer_fd,buf,nread) != nread) {
1210 serverLog(LL_WARNING,"Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: %s", strerror(errno));
According to man 2 write
RETURN VALUE
On success, the number of bytes written is returned (zero indicates nothing was written). On error, -1 is returned, and errno is set appropriately.
When executing write(), only if -1 is returned, the errno will be set appropriately. Otherwise the errno is used incorrectly. So we need to check return value first. If it is equal to -1, strerror is fine.
Comment From: shenlongxing
I opened a PR to fix this problem. #4985
Comment From: drewboardman
I'm seeing something similar as well
Unrecognized REPLCONF option: capa