Crash report I have experienced a crash of redis. I am using redis for memory caching of a nextcloud instance.

In the log I see these entries:

38944:M 27 May 2022 18:57:00.109 # Background saving terminated by signal 7
38944:M 27 May 2022 18:57:06.025 * 10 changes in 300 seconds. Saving...
38944:M 27 May 2022 18:57:06.026 * Background saving started by pid 49016

Those are repeated for about 15 min, after which I realized the nextcloud server was down. I then restarted redis with

$ systemctl restart redis-server

In the log the following entries were added. I assume this is due to systemctl shutting down the server, although redis still says that it crashed by a SIGBUS (7):

=== REDIS BUG REPORT START: Cut & paste starting from here ===
38944:M 27 May 2022 18:59:05.716 # Redis 6.0.16 crashed by signal: 7, si_code: 2
38944:M 27 May 2022 18:59:05.716 # Crashed running the instruction at: 0xaaaab5798748
38944:M 27 May 2022 18:59:05.716 # Accessing address: 0xfffff4567f64
38944:M 27 May 2022 18:59:05.716 # Failed assertion: <no assertion failed> (<no file>:0)

After redis was restarted everything appeared to work fine. This has been installed from the Debian 11 official repos. It is interesting to note that this is a Raspberry Pi 4 server with 2 GB of memory. I installed the official Debian arm64 port since yesterday and just today I had the crash. The interesting thing is that I have upgraded the server from raspbian 32 bits (also based on Debian 11) to Debian 11 arm64. The raspbian installation has been running for a couple of years without problem and now the workload is basically the same. The version of redis has been exactly the same in both cases. So it seems to me that this is a arm64 specific problem. Or at least something that has not been triggered with the 32 bits version of redis. The redis log mentions the overcommit and THP settings of the kernel. These are outputs of the sar command during the period redis crashed:

$ sar -r 
17:40:04    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
17:50:04       241584   1309844    462148     23,73     95552   1007828   1098712     37,27    549116    954508      1592
18:00:01       201548   1405976    364768     18,73    103420   1134004   1218084     41,32    576824    959104       424
18:10:04       175332   1318532    452596     23,24     99152   1078108   1188888     40,33    572956    990472       700
18:15:01       145872   1303404    467068     23,98    100220   1090952   1248996     42,37    578084   1012752       396
18:20:04       121068   1303496    467356     24,00    100892   1115164   1248256     42,34    591840   1024860       464
18:25:05       416296   1233720    537288     27,59     83212    769576   1317416     44,69    451120    874844      1040
18:30:01       376324   1258648    512300     26,31     85520    832544   1281672     43,48    512112    852532      8752
18:35:04       120716   1246988    523940     26,90     87036   1075020   1285004     43,59    634496    985616      7856
18:40:05        38108   1236336    534408     27,44     86604   1147596   1304160     44,24    729288    974832       420
18:45:01        96916   1244552    526164     27,02     71780   1111980   1294972     43,93   1092000    551748       352
18:50:05       110608   1244844    517532     26,57     72372   1106088   1294740     43,92   1081900    548252       340
18:55:05       178780   1314304    444348     22,82    114880   1069356   1220264     41,39   1026900    534376     37500
19:00:01       102188   1307832    472536     24,26    118284   1115260   1225784     41,58    953444    687724       288
19:05:05       210968   1296764    464112     23,83    107748   1026756   1258652     42,70    847996    685696       464
$ sar -H
17:40:04    kbhugfree kbhugused  %hugused kbhugrsvd kbhugsurp
17:50:04            0         0      0,00         0         0
18:00:01            0         0      0,00         0         0
18:10:04            0         0      0,00         0         0
18:15:01            0         0      0,00         0         0
18:20:04            0         0      0,00         0         0
18:25:05            0         0      0,00         0         0
18:30:01            0         0      0,00         0         0
18:35:04            0         0      0,00         0         0
18:40:05            0         0      0,00         0         0
18:45:01            0         0      0,00         0         0
18:50:05            0         0      0,00         0         0
18:55:05            0         0      0,00         0         0
19:00:01            0         0      0,00         0         0
19:05:05            0         0      0,00         0         0
19:10:00            0         0      0,00         0         0
19:15:01            0         0      0,00         0         0

Additional information

  1. Debian 11 arm64
  2. Running redis for memory caching of Nextcloud.

Comment From: oranagra

@cquike there's no stack trace in the crash log? @yossigo do you remember any difference between our repo and the debian repo or a recent fix that could explain a SIGBUS? @cquike can you try the latest 6.2.7 from our repo?

Comment From: cquike

Hi,

I think there is a problem with the hardware, since I now see this error in the logs:

may 27 18:52:10 hostname kernel: Read-error on swap-device (8:16:6609360)
may 27 18:52:10 hostname kernel: blk_update_request: I/O error, dev sdb, sector 6609352 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

So most likely not a problem with redis but with the underlying hardware...

Just FYI: no, there is no stack trace. There is a line with this: ------ STACK TRACE ------ but what follows is the starting of the redis process again. So it seems as the stack trace would be completely empty.

Thank you for your help. I guess this can be closed as it is likely not a redis problem.

Comment From: oranagra

Something was seriously broken. The signal handler seems to have crashed while trying to print the stack trace. Maybe something to do with the code being swapped out and being unable to swap in?

Anyway I've never seen anything like it, so I assume it's indeed a hardware issue. Closing.