It is no longer possible to get both a core file and a crash report at the same time.
When this line https://github.com/redis/redis/blame/unstable/src/server.c#L6452 was added, it disabled core files. I can enable them again by setting crash-log-enabled to false, but then I don't get the crash log.
Comment From: oranagra
I'm not sure that it's related to #8004.
Maybe somehow related to one of the changes in https://github.com/redis/redis/commit/90b717e72340081ee87f0b85b4ef00b2a5bd2bf2
Although I still see that the signal handle is ending with bugReportEnd(1, sig);
@daniel-house are you referring to a crash report after an assertion or a signal?
P.S. I personally don't think core dumps on Redis are very useful. Redis usual consumes massive amounts of memory. So the core dump is huge and also slows down the restarts. Also if you have both a crash report and a core dump, the core dump may be corrupt by the report generation.
Comment From: daniel-house
This line is also part of https://github.com/redis/redis/pull/8004 : https://github.com/redis/redis/blame/unstable/src/debug.c#L915
Actually, I was unaware of the RedisModule_Assert, so I coded my own (abend with varargs) in the same manner as I always do in C/Unix. abend logs a message and then calls abort(). Asserts are nice but I often want the message to include additional contextual information so that it is easier to debug. As soon as I moved to a version of Redis that included https://github.com/redis/redis/pull/8004, my call to abort stopped generating core files.
In general core dumps in production are a bad idea. Copying them from one machine to another can cause them to expand enormously due to holes that are compressed in the original core file. They slow down restarts as you say, and may expose sensitive data. It is a good practice to use ulimit -c 0 to prevent core files in production. This is the default value of ulimit -c on many versions of Unix.
During development and test core files are wonderful. As a consequence, I routinely use ulimit -c unlimited in all my test and development environments. In my experience the stack printed in the Redis crash-log is far inferior to the stack reported by gdb even when both the crash-log and core are generated, but, as you say, sometimes the core file does get corrupted during report generation. I like the Redis crash-log, and do not want to turn it off, but I also like the core file. If something goes wrong I don't want to have to re-run it just to get the other form of information. If there is a SIGSEGV, or any other signal that has a default action of dropping a core file, and it causes a crash-log, I would like to also see a core file.
Core files are OS-level objects. Their creation can be enabled and disabled at the OS-level. It is a bad idea for any executable to deliberately interfere with OS-level behavior, especially when it is redundant with the OS-level. It is a far worse idea for the OS-level of a production environment to trust any executable in any way, whatsoever. Hence the existence of the ulimit command.
Comment From: oranagra
ok, so you're using abort() which sends SIGABRT.
before that PR you mentioned, you would have got only a core dump, but no crash report (so for production systems, where core dumps should be disabled, that's quite bad).
however, as i mentioned in my initial response, i don't see why core dumps won't happen, since we re-emit the signal when we're done printing the crash log (bugReportEnd(1, sig);).
i've just tested that on both 6.2.5 and unstable, and it seemed to work on both:
diff --git a/src/debug.c b/src/debug.c
index d29d48673..42154cc52 100644
--- a/src/debug.c
+++ b/src/debug.c
@@ -474,6 +474,8 @@ NULL
addReplyHelp(c, help);
} else if (!strcasecmp(c->argv[1]->ptr,"segfault")) {
*((char*)-1) = 'x';
+ } else if (!strcasecmp(c->argv[1]->ptr,"abort")) {
+ abort();
} else if (!strcasecmp(c->argv[1]->ptr,"panic")) {
serverPanic("DEBUG PANIC called at Unix time %lld", (long long)time(NULL));
} else if (!strcasecmp(c->argv[1]->ptr,"restart") ||
oran@Oran-laptop:~/work/redis$ ulimit -c unlimited
oran@Oran-laptop:~/work/redis$ src/redis-server
1897:C 23 Sep 2021 09:34:14.674 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1897:C 23 Sep 2021 09:34:14.674 # Redis version=255.255.255, bits=64, commit=7c376398, modified=1, pid=1897, just started
.
.
.
=== REDIS BUG REPORT START: Cut & paste starting from here ===
1897:M 23 Sep 2021 09:34:16.650 # Redis 255.255.255 crashed by signal: 6, si_code: -6
1897:M 23 Sep 2021 09:34:16.650 # Killed by PID: 1897, UID: 1000
1897:M 23 Sep 2021 09:34:16.650 # Crashed running the instruction at: 0x7f0c9b6e3fb7
------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f0c9b6e3fb7]
Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f0c9baa8980]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f0c9b6e3fb7]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f0c9b6e5921]
src/redis-server *:6379(debugCommand+0x102)[0x562999eec334]
.
.
.
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
Please report the crash by opening an issue on github:
http://github.com/redis/redis/issues
Suspect RAM error? Use redis-server --test-memory to verify it.
Aborted (core dumped)