1. I startup a redis server, then send a "slaveof" command to turn it to a slave. The rdb file size is about 1GB, while the dir disk space is not big enough(281M).

$ redis 127.0.0.1:6379> config get dir 1) "dir" 2) "/home/shenlx/data"

[shenlx@snrsdevapp13 src]$ df -hl Filesystem Size Used Avail Use% Mounted on ... tmpfs 16G 16K 16G 1% /dev/shm /dev/vda1 194M 100M 84M 55% /boot /dev/mapper/systemvg-homelv 2.0G 1.6G 281M 86% /home ...

  1. So, slave writes to temp file failed. The error is as follows:

10964:S 05 Jun 09:41:55.087 * MASTER <-> SLAVE sync: receiving 413389026 bytes from master 10964:S 05 Jun 09:41:59.129 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: Resource temporarily unavailable 10964:S 05 Jun 09:41:59.988 * Connecting to MASTER 10.245.71.252:6379 10964:S 05 Jun 09:41:59.988 * MASTER <-> SLAVE sync started 10964:S 05 Jun 09:41:59.991 * Non blocking connect for SYNC fired the event. 10964:S 05 Jun 09:41:59.994 * Master replied to PING, replication can continue... 10964:S 05 Jun 09:42:00.000 * (Non critical) Master does not understand REPLCONF capa: -ERR Unrecognized REPLCONF option: capa 10964:S 05 Jun 09:42:00.000 * Partial resynchronization not possible (no cached master) 10964:S 05 Jun 09:42:00.003 * Full resync from master: 511d485419f66d97b9bfc302626fe94fabd2f10c:6550301656 10964:S 05 Jun 09:42:09.106 * MASTER <-> SLAVE sync: receiving 413389026 bytes from master 10964:S 05 Jun 09:42:53.762 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: No space left on device 10964:S 05 Jun 09:42:54.150 * Connecting to MASTER 10.245.71.252:6379 10964:S 05 Jun 09:42:54.150 * MASTER <-> SLAVE sync started 10964:S 05 Jun 09:42:54.153 * Non blocking connect for SYNC fired the event. 10964:S 05 Jun 09:42:54.156 * Master replied to PING, replication can continue... 10964:S 05 Jun 09:42:54.162 * (Non critical) Master does not understand REPLCONF capa: -ERR Unrecognized REPLCONF option: capa 10964:S 05 Jun 09:42:54.162 * Partial resynchronization not possible (no cached master) 10964:S 05 Jun 09:42:54.165 * Full resync from master: 511d485419f66d97b9bfc302626fe94fabd2f10c:6550324985 10964:S 05 Jun 09:43:03.468 * MASTER <-> SLAVE sync: receiving 413389026 bytes from master 10964:S 05 Jun 09:43:29.543 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: Resource temporarily unavailable

  1. The error logs are confusing and incorrect. The root cause is as follows:
replication.c
1209     if (write(server.repl_transfer_fd,buf,nread) != nread) {                                                                                        
1210         serverLog(LL_WARNING,"Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: %s", strerror(errno));

According to man 2 write

RETURN VALUE
       On  success, the number of bytes written is returned (zero indicates nothing was written).  On error, -1 is returned, and errno is set appropriately.

When executing write(), only if -1 is returned, the errno will be set appropriately. Otherwise the errno is used incorrectly. So we need to check return value first. If it is equal to -1, strerror is fine.

Comment From: shenlongxing

I opened a PR to fix this problem. #4985

Comment From: drewboardman

I'm seeing something similar as well

Unrecognized REPLCONF option: capa