Redis replication test(s) failing on Solaris 10

I'm working on getting Redis 6 working on Solaris 10 and the set of tests at the bottom of tests/integration/replication.tcl is failing because the process that's sent SIGTERM is never reaped by its parent by calling one of the wait() family of functions.

As I noted in this stack overflow question I won't be able to actively look into this for a few days because I don't have access to the Solaris machines while @ home.

I know this function is executing and I believe wait3() really is being called (though admittedly I didn't pay enough attention to line 1783).

If you can offer suggestions for things to look into / try before I get back to work that would be very helpful.

Comment From: oranagra

@Phantal do when you run redis (master+replica) manually (not though the tests), do you see similar problems? your SO post mentioned grandchild, do you mean that the test is the parent, and the grandchild is the forked process redis creates?

Comment From: Phantal

I don't know all the details, but it looked like Redis had fork()ed, as did its child and the grandchild zombies.

I haven't tried it without the test yet.

On Sun, Jul 12, 2020, 12:33 AM Oran Agra notifications@github.com wrote:

@Phantal https://github.com/Phantal do when you run redis (master+replica) manually (not though the tests), do you see similar problems? your SO post mentioned grandchild, do you mean that the test is the parent, and the grandchild is the forked process redis creates?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/redis/redis/issues/7504#issuecomment-657181795, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACG4TLWTI7F5CJZZDZX3C3TR3FKNDANCNFSM4OXUY2OQ .

Comment From: oranagra

@Phantal the child process in redis doesn't fork. please dig deeper when you get a chance and let me know what you find. thanks.

Comment From: Phantal

@oranagra Unfortunately I'm both learning how to use Redis + get it running on Solaris, so it's a bit slow going.

I wrote a simple bash script to test this based on the "no" portion of the repl test by copying the config used by that test. The script looks basically like this; my apologies if there's typos, the machine I'm doing this on is air-gapped:

#!/bin/bash
redis-server tests/tmp/redis.conf.6895.2 &
sleep 2;
redis-cli -h localhost -p 21212 config set repl-diskless-sync yes
redis-cli -h localhost -p 21212 config set repl-diskless-sync-delay 1
redis-cli -h localhost -p 21212 debug populate 20000 test 10000
redis-cli -h localhost -p 21212 config set rdbcompression no

redis-server tests/tmp/redis.conf.6895.4 &
redis-server tests/tmp/redis.conf.6895.6 &
sleep 5;

redis-cli -h localhost -p 21213 config set repl-diskless-load swapdb
redis-cli -h localhost -p 21213 config set key-load-delay 1

redis-cli -h localhost -p 21213 replicaof localhost 21212
redis-cli -h localhost -p 21214 replicaof localhost 21212

In the "no replicas drop" test it never sends SIGTERM, thus no kill in the above.

Initially 3 servers are running (master + 2 replicas), then when one or both of the last 2 lines execute the master fork()s to make the background RDB transfer process. After some amount of time the background RDB process becomes defunct.

Somewhere in there I see messages along these lines from the replicas and/or master:

replica: # MASTER timeout: no data nor PING received...
replica: # Connection with master lost.
replica: - Caching the disconnected master state.
replica: * Connecting to MASTER localhost:21212
master: - Client closed connection

Here's what I suspect is happening:

The background RDB is failing for some reason or another and dies while communicating with one of the replicas
The master doesn't know it died so never calls wait()
The replicas timeout and eventually re-connect
The master doesn't know it died so never resumes the background RDB transfer

Comment From: oranagra

@Phantal i tried to reproduce this (found another bug: https://github.com/redis/redis/pull/7518), but as far as i can tell the child is always reaped.

the code in checkChildrenDone was indeed modified in v6 and will skip calling wait3 if server.rdb_pipe_conns is not NULL, and i did have some suspension that maybe there's some problem if the child is killed (e.g. with kill -9) before it has the chance to close the rdb_pipe_write, but as far as i can tell (and test) it doesn't matter, since the parent closes that handle as soon as fork() succeeds, so if the child dies unexpectedly, the parent will still be able to detect it, print Diskless rdb transfer, done reading from pipe, %d replicas still up and nullify rdb_pipe_conns so that wait3 can be called.

maybe if you'll send me the full log files i can find out something new.

Comment From: Phantal

I'm going to try your changes in #7518 and see what happens. I'll also see what I can do about the logs.

Comment From: Phantal

Unfortunately I wasn't able to figure it out and I've been told I'm spending too much time on this. We're moving away from Solaris in the not too distant future, so at worst we delay using Redis until we're off Solaris.

That said, I tried running the master under truss to see whether anything jumped out at me. The process finishes writing the *-temp.rdb file, closes it, then calls _exit(0). For the RDB transfer process I didn't see anything out of the ordinary, including signals; I was hoping to see something easy like SIGBUS or SIGSEGV, but alas.

Some time after that process terminated I started seeing ignored SIGPIPEs in the master, but that's it. Unfortunately I didn't have truss emitting timestamps so I don't know how much time passed between the two events.

Comment From: oranagra

@Phantal can you please just run ./runtest --single integration/replication --only "diskless no replicas drop during rdb pipe" --dont-clean and then send me the log files from tests/tmp/?

BTW, assuming all your Solaris trouble are in the diskless master replication mode (the one that uses a pipe to send the rdb content to the parent to forward to replicas), you can just not use this feature (which is anyway not the default).

I mean, comment out the few tests that enable repl-diskless-sync, if the tests pass you should be safe to use Redis right now and not wait to move away from Solaris.

Comment From: Phantal

@oranagra Could you either point me at the appropriate code / documentation for how diskless is implemented or give me a short run-down it?

Comment From: Phantal

I'll see what I can do about the logs.

It's also failing these replication-psync tests:

Test replication partial resync: no backlog (yes, disabled, 1)
Test replication partial resync: backlog expired (yes, disabled, 1)
Test replication partial resync: no backlog (yes, swapdb, 1)

I'm a little surprised it didn't fail backlog expired (yes, disabled,1) given the trend (diskless in all cases).

Comment From: oranagra

@Phantal so all the tests that are failing are related to diskless replication, not sure why some fail and others don't. maybe if i'll have the logs i'll figure it out. but anyway, it is disabled by default, so if you don't explicitly enable it, you should be safe.

Before redis 6.0, the way diskless replication worked is that the fork child process was writing the rdb content directly to the sockets (of multiple replicas), and would have exit if they all disconnected. if the child exited successfully, then the master would have waited for one REPLCONF ACK from the replicas and then start sending the command stream to them. Since they rdb file size isn't known in advance, the replicas are looking for a 40 byte EOF marker at the end of each payload they read so that they knows the rdb ended, and they send that ACK which enables the rest of the stream.

Anyway, in redis 6.0 due to TLS support, this was changed, now there's a pipe between the parent and the child which is used to transfer this rdb content to the parent, and the parent is the one who sends the data to the replicas. see 5a47794606

Comment From: Phantal

An idea just occurred to me. Prior to 6.x the master would fork(), build the RDB in memory then send it directly to the replicas. I'm guessing it already had sockets open to the replicas from prior to the fork().

Now it no longer needs those socket connections but they're probably still open after the fork(). My hypothesis is the forked child does something to shutdown() / close() the replica connections which causes them to disconnect from the real master.

There's at least two ways I could test this hypothesis but unfortunately I don't have the time today for the first one and don't have root for the 2nd one:

Run truss & watch for close() / shutdown() calls on the descriptor in the forked child
Run wireshark and look for FIN or something similar being sent from the master to the replicas

(1) has another problem: when I use truss I often cannot reproduce the problem(s) because it's sensitive to timing.

On a positive note one of the tests in tests/unit is also able to fail, though it took a bit of makefile tweaking to make it build.

Comment From: oranagra

@Phantal i really don't see any way the fork child can close (and certainly not shutdown) a socket, the portion of the codebase it runs is quite narrow, and when done it calls exit.

Comment From: yossigo

I think this can be a result of different buffering strategies between Linux and Solaris, resulting with the child info pipe blocking. Just changing pipe(server.child_info_pipe) to pipe2(server.child_info_pipe, O_NONBLOCK) seems to improve this. @Phantal can you please see if that makes a difference for you?

Comment From: oranagra

when testing, apply that change to this code block too (which is probably the problematic one in your case):

    if (pipe(pipefds) == -1) return C_ERR;
    server.rdb_pipe_read = pipefds[0];
    server.rdb_pipe_write = pipefds[1];

Comment From: yossigo

@oranagra I don't think this would be necessary as there's an anetNonBlock() call right after that which has the same effect.

Comment From: Phantal

I'll give it a try this week.