Hi guys

I faced with an issue caused by to low client-output-buffer-limit. I had 8Gb DB with very high traffic and replica was not able to start before I increased client-output-buffer-limit to 512Mb.

The issue is - logs (notice on master and verbose on slave) were almost useless. Here are some log snapshots from endless replication init loop on master:

[6027] 22 Jan 04:10:16.688 * Slave ask for synchronization
[6027] 22 Jan 04:10:16.688 * Starting BGSAVE for SYNC
[6027] 22 Jan 04:10:16.719 * Background saving started by pid 29323
[29323] 22 Jan 04:11:23.748 * DB saved on disk
[29323] 22 Jan 04:11:23.767 * RDB: 3631 MB of memory used by copy-on-write
[6027] 22 Jan 04:11:23.850 * Background saving terminated with success
[6027] 22 Jan 04:12:03.681 * Synchronization with slave succeeded
[6027] 22 Jan 04:13:46.009 # Client addr=10.xx.xx.218:38413 fd=499 name= age=210 idle=103 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=4102 omem=98005232 events=rw cmd=sync scheduled to be closed ASAP for overcoming of output buffer limits.

And slave logs:

[13888] 22 Jan 04:05:02.554 * Connecting to MASTER 10.xx.xx.220:6379
[13888] 22 Jan 04:05:02.554 * MASTER <-> SLAVE sync started
[13888] 22 Jan 04:05:02.554 * Non blocking connect for SYNC fired the event.
[13888] 22 Jan 04:05:02.554 * Master replied to PING, replication can continue...
[13888] 22 Jan 04:06:09.662 * MASTER <-> SLAVE sync: receiving 3246530094 bytes from master
[13888] 22 Jan 04:06:32.101 - Closing idle client
[13888] 22 Jan 04:06:50.065 * MASTER <-> SLAVE sync: Loading DB in memory
[13888] 22 Jan 04:10:15.889 * MASTER <-> SLAVE sync: Finished with success

[13888] 22 Jan 04:10:16.690 * Connecting to MASTER 10.xx.xx.220:6379
[13888] 22 Jan 04:10:16.690 * MASTER <-> SLAVE sync started
...

First, I would expect logs on master not to report about "I'm going to do somethings soon, you know", but report "I've just disconnected a guy who used all my buffers". Second, slave should say at least something about forced re-init. Currently it simply says "Connecting to MASTER" again.

redis_version:2.6.17

Alex

Comment From: mattsta

redis_version:2.6.17

Your Redis version is kinda old, but I don't think these error messages have changed since then.

The issue is - logs (notice on master and verbose on slave) were almost useless.

Are they?

The master reports scheduled to be closed ASAP for overcoming of output buffer limits on the client running cmd=sync so that should help you find output buffer limit settings. It seems the only way it could be more helpful is by pointing to the exact configuration option to change and by reporting by how much the buffer limit was exceeded. It could be even more helpful if Redis tracked buffer limit errors then allowed some kind of FIX-BUFFER-LIMITS command to increase limits on-demand based on past observed behavior, but that's never been considered before.

As for the slave not telling you much, that's just Redis being Redis. The slave doesn't know about a forced re-init (on the master the output buffer hit max size, so the master just drops the connection; there's no protocol for sending status reasons between master->replica). All the slave knows is the connection dropped and now it's automatically re-trying the connection. I think the "Connecting..." line is supposed to imply there was a previous disconnection, but it could be more clear. For another example of great status output, try telling a Redis instance to SLAVEOF to itself.

Comment From: alexkazeko

'Scheduled to be closed' and 'limit exceeded. connection closed' are two different messages. Pointing to exact parameter would be even better.

As for slave, message "Connecting..." right after "Finished with success" is also weird. Why won't it say "connection dropped"? Just to point that something is wrong...

Comment From: mattsta

'Scheduled to be closed' and 'limit exceeded. connection closed' are two different messages.

Interesting point. From a Redis point of view, everything connection related is async. In this case you get notified why the client is scheduled for disconnection. But, at actual disconnect time (which is typically just a few milliseconds after the scheduling), all Redis knows is "this client is in my list of connections to terminate" so it destroys the connection — but it doesn't know why it was asked to kill the connection at the time of the disconnect.

Why won't it say "connection dropped"? Just to point that something is wrong...

It's possible. I think it would go here: https://github.com/antirez/redis/blob/9802ec3c83cf6b61edad50bc41ddb3f6fdb13c6f/src/networking.c#L651-L663

Comment From: oranagra

putting aside the phrasing of the log message in the master. the one that was missing in the replica was added long ago. it now prints Connection with master lost, and really it doesn't have any other info it can show. i'm closing this issue, feel free to communicate or re-open if you feel i'm wrong.