https://github.com/cloudius-systems/osv/issues/865 reports Redis 3.2.8 not start up properly on the OSv operating system, reporting a failure to bind the server socket. OSv (http://osv.io/) is an open-source Linux-compatible operating system. A relevant piece of information is that OSv does not support IPv6 at all.

The problem seems to be in server.c code which assumes that if anetTcp6Server() fails to bind an IPv6 socket, it will fail with EAFNOSUPPORT. However, why must it? What the code actually calls is anetV6Only() (anet.c) which does

setsockopt(s,IPPROTO_IPV6,IPV6_V6ONLY,&yes,sizeof(yes))

So on a system where IPv6 is not supported at all, the protocol (IPPROTO_IPV6) will not be known, and this will fail with EPROTONOSUPPORT, causing the entire anetTcp6Server() to fail with EPROTONOSUPPORT - not the EAFNOSUPPORT which server.c expect.

So I propose the following patch for better support for systems which don't support IPv6 at all:

--- server.c.orig   2017-03-22 13:36:14.699146635 +0200
+++ server.c    2017-03-22 13:37:01.809136644 +0200
@@ -1786,7 +1786,7 @@
             if (fds[*count] != ANET_ERR) {
                 anetNonBlock(NULL,fds[*count]);
                 (*count)++;
-            } else if (errno == EAFNOSUPPORT) {
+            } else if (errno == EAFNOSUPPORT || errno == EPROTONOSUPPORT) {
                 unsupported++;
                 serverLog(LL_WARNING,"Not listening to IPv6: unsupproted");
             }

Comment From: jaromil

Ping. This would be useful to merge.

Comment From: twk3

This also happens when running redis inside docker, when ipv6 hasn't been configured for the docker network.

It would be nice to have this fixed.

Comment From: jaromil

Same for LXC. @antirez I do recommend giving some attention. IMHO the fix is sane and has no regression.

Comment From: cduranleau

If this hasn't been merged isn't "unsupproted" supposed to be "unsupported"?

M1711995:redis cd033922$ grep -r unsupproted .
./src/server.c:                serverLog(LL_WARNING,"Not listening to IPv6: unsupproted");
./src/server.c:                    serverLog(LL_WARNING,"Not listening to IPv4: unsupproted");

Comment From: badboy

I'll check this today.

Comment From: badboy

@nyh I wonder why the socket can be created as AF_INET6 if IPv6 is not enabled at all.

Comment From: nyh

@badboy this is an interesting point. So you're are saying that although setsockopt should not have failed with EAFNOSUPPORT, the socket creation should have failed earlier, with this EAFNOSUPPORT. I'll recheck OSv why this is happening and if it makes sense or is a bug. But why is the same issue happening in containers too?

Comment From: badboy

setsockopt can't fail with EAFNOSUPPORT, that one has to come from the socket call instead. In my understanding, if IPv6 is not supported, that call should already reject creating a socket. I'll look into why this is happening on LXC as well if I can get a LXC setup going.

Comment From: nyh

Turns out you're right, I misunderstood where the error is coming from in OSv... It is indeed the socket() call to create the AF_INET6, SOCK_STREAM, 0 which fails - but it fails with a EPROTONOSUPPORT instead of failing with a EAFNOSUPPORT.

The latter (which you originally checked) is more resonable, although the Linux socket(2) manual page is very ambigous on what should be returned in this case: Should socket() return EAFNOSUPPORT because the "address family" AF_INET6 is not supported, or EINVAL because the "protocol family" is not available, or EPROTONOSUPPORT because this protocol (0) is not supported within this domain.

I agree that EAFNOSUPPORT makes more sense than the two others and I'll look into changing this in OSv. So now I'm curious what happens in Linux containers.

Comment From: badboy

Thanks for checking! (I tried digging into OSv, but it is just too huge) Given that it still fails in lxc containers (according to jaromil), I guess your fix would still be valid (I have it in my fork at the moment). I'll try looking into lxc tomorrow and will submit a PR

Comment From: nyh

I just sent a patch to OSv to change the socket() errno for AF_INET6 to EAFNOSUPPORT, as redis expected and as is indeed more logical - and now the unpatched redis worked correctly on OSv. So sorry about the mess, I don't actually need this patch for OSv. Whether it's still relevant to Linux containers, and why, is a separate question...

Comment From: jaromil

FYI in ISC DHCP which is a fairly portable and standard implementation several type of errors are checked in return to socket and bind calls, which I believe is sane to catch all possible behaviours across implementations.

        if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
            errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
            errno == EAFNOSUPPORT {

I've filed this fix as PR #4108

edit: removed EINVAL check

Comment From: badboy

But we do want to fail under certain conditions and not ignore them. Especially things like EINVAL seem far to broad for that check.

Comment From: jaromil

Yes. The reason I put it as a PR already is that it may make it easier to test (yet I don't see travis builds connected to this repo). I agree with your assessment and will now remove EINVAL.

Comment From: badboy

Yeah, we don't have automatic Travis tests (sadly). Then again I don't think there are any tests for that code path in the first place.

Comment From: jaromil

Yep. I was also wondering how to test this easily and widely...

OT: is there a particular reason why there are no Travis tests? or its just because noone has set them up?

Comment From: badboy

OT: There is http://ci.redis.io/ which runs regularly on unstable and latest. Several PRs enabling Travis are available, but were rejected because it was considered meaningless. Check the search ;)

I tried getting LXC containers up and running to test this, but I was unable to get a configuration with otherwise disabled IPv6. Linux booted with ipv6.disable=1 in the cmdline gives a EAFNOSUPPORT.

Comment From: jaromil

OT: awesome. pity doesn't builds PRs. make test on my Devuan install failed at a certain point.

My work schedule is insanely packed until august, yet I will try to reproduce the problem and provide detailed instructions for a more deterministic approach if I find any more time at hand. Meanwhile I completely understand if you want to leave this behind because is not reproducible and lacks information about that. The problem does persists on my side and in any case I use redis from source so I can simply apply my patch and be happy.

Comment From: oranagra

solved by #5598 (redis 5.0.3) and improved by #7936