Crash report I have experienced a crash of redis. I am using redis for memory caching of a nextcloud instance.
In the log I see these entries:
38944:M 27 May 2022 18:57:00.109 # Background saving terminated by signal 7
38944:M 27 May 2022 18:57:06.025 * 10 changes in 300 seconds. Saving...
38944:M 27 May 2022 18:57:06.026 * Background saving started by pid 49016
Those are repeated for about 15 min, after which I realized the nextcloud server was down. I then restarted redis with
$ systemctl restart redis-server
In the log the following entries were added. I assume this is due to systemctl shutting down the server, although redis still says that it crashed by a SIGBUS (7):
=== REDIS BUG REPORT START: Cut & paste starting from here ===
38944:M 27 May 2022 18:59:05.716 # Redis 6.0.16 crashed by signal: 7, si_code: 2
38944:M 27 May 2022 18:59:05.716 # Crashed running the instruction at: 0xaaaab5798748
38944:M 27 May 2022 18:59:05.716 # Accessing address: 0xfffff4567f64
38944:M 27 May 2022 18:59:05.716 # Failed assertion: <no assertion failed> (<no file>:0)
After redis was restarted everything appeared to work fine. This has been installed from the Debian 11 official repos. It is interesting to note that this is a Raspberry Pi 4 server with 2 GB of memory. I installed the official Debian arm64 port since yesterday and just today I had the crash. The interesting thing is that I have upgraded the server from raspbian 32 bits (also based on Debian 11) to Debian 11 arm64. The raspbian installation has been running for a couple of years without problem and now the workload is basically the same. The version of redis has been exactly the same in both cases. So it seems to me that this is a arm64 specific problem. Or at least something that has not been triggered with the 32 bits version of redis. The redis log mentions the overcommit and THP settings of the kernel. These are outputs of the sar command during the period redis crashed:
$ sar -r
17:40:04 kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
17:50:04 241584 1309844 462148 23,73 95552 1007828 1098712 37,27 549116 954508 1592
18:00:01 201548 1405976 364768 18,73 103420 1134004 1218084 41,32 576824 959104 424
18:10:04 175332 1318532 452596 23,24 99152 1078108 1188888 40,33 572956 990472 700
18:15:01 145872 1303404 467068 23,98 100220 1090952 1248996 42,37 578084 1012752 396
18:20:04 121068 1303496 467356 24,00 100892 1115164 1248256 42,34 591840 1024860 464
18:25:05 416296 1233720 537288 27,59 83212 769576 1317416 44,69 451120 874844 1040
18:30:01 376324 1258648 512300 26,31 85520 832544 1281672 43,48 512112 852532 8752
18:35:04 120716 1246988 523940 26,90 87036 1075020 1285004 43,59 634496 985616 7856
18:40:05 38108 1236336 534408 27,44 86604 1147596 1304160 44,24 729288 974832 420
18:45:01 96916 1244552 526164 27,02 71780 1111980 1294972 43,93 1092000 551748 352
18:50:05 110608 1244844 517532 26,57 72372 1106088 1294740 43,92 1081900 548252 340
18:55:05 178780 1314304 444348 22,82 114880 1069356 1220264 41,39 1026900 534376 37500
19:00:01 102188 1307832 472536 24,26 118284 1115260 1225784 41,58 953444 687724 288
19:05:05 210968 1296764 464112 23,83 107748 1026756 1258652 42,70 847996 685696 464
$ sar -H
17:40:04 kbhugfree kbhugused %hugused kbhugrsvd kbhugsurp
17:50:04 0 0 0,00 0 0
18:00:01 0 0 0,00 0 0
18:10:04 0 0 0,00 0 0
18:15:01 0 0 0,00 0 0
18:20:04 0 0 0,00 0 0
18:25:05 0 0 0,00 0 0
18:30:01 0 0 0,00 0 0
18:35:04 0 0 0,00 0 0
18:40:05 0 0 0,00 0 0
18:45:01 0 0 0,00 0 0
18:50:05 0 0 0,00 0 0
18:55:05 0 0 0,00 0 0
19:00:01 0 0 0,00 0 0
19:05:05 0 0 0,00 0 0
19:10:00 0 0 0,00 0 0
19:15:01 0 0 0,00 0 0
Additional information
- Debian 11 arm64
- Running redis for memory caching of Nextcloud.
Comment From: oranagra
@cquike there's no stack trace in the crash log? @yossigo do you remember any difference between our repo and the debian repo or a recent fix that could explain a SIGBUS? @cquike can you try the latest 6.2.7 from our repo?
Comment From: cquike
Hi,
I think there is a problem with the hardware, since I now see this error in the logs:
may 27 18:52:10 hostname kernel: Read-error on swap-device (8:16:6609360)
may 27 18:52:10 hostname kernel: blk_update_request: I/O error, dev sdb, sector 6609352 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
So most likely not a problem with redis but with the underlying hardware...
Just FYI: no, there is no stack trace. There is a line with this:
------ STACK TRACE ------
but what follows is the starting of the redis process again. So it seems as the stack trace would be completely empty.
Thank you for your help. I guess this can be closed as it is likely not a redis problem.
Comment From: oranagra
Something was seriously broken. The signal handler seems to have crashed while trying to print the stack trace. Maybe something to do with the code being swapped out and being unable to swap in?
Anyway I've never seen anything like it, so I assume it's indeed a hardware issue. Closing.