Crash report

Paste the complete crash log between the quotes below. Please include a few lines from the log preceding the crash report to provide some context.

1436:M 24 Aug 2022 06:22:05.613 # Redis 6.2.6 crashed by signal: 7, si_code: 2
61436:M 24 Aug 2022 06:22:05.614 # Accessing address: 0x7f9ad5be2000
61436:M 24 Aug 2022 06:22:05.614 # Crashed running the instruction at: 0x7f9ad5be1fff
191897:C 24 Aug 2022 06:47:41.689 # Write error writing append only file on disk: Connection timed out

Additional information

1. OS distribution and version

uname -a
Linux dc6-cdb-001 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

2. core file was dumped. adding the debugger to the core file, it seems the crash occurred in loadAppendOnl;yFile, which seems to get the wrong file name

Core was generated by `/home/ip/shared/inf/crystal/1.0/63730111/bin/crs --lua_path /home/ip/shared/inf'.
Program terminated with signal 7, Bus error.
#0  0x0000000000482910 in loadAppendOnlyFile (filename=0x4000000000000000 <Address 0x4000000000000000 out of bounds>) at aof.c:883
883     aof.c: No such file or directory.
(gdb) info threads
  Id   Target Id         Frame 
  18   LWP 246709        0x00007f216ae06eb3 in ?? ()
  17   LWP 246716        0x00007f216bbd8de2 in ?? ()
  16   LWP 246720        0x00007f216bbd8de2 in ?? ()
  15   LWP 246715        0x00007f216bbd8de2 in ?? ()
  14   LWP 246707        0x00007f216bbd8a35 in ?? ()
  13   LWP 246708        0x00007f216bbd8a35 in ?? ()
  12   LWP 246705        0x00007f216ad3f58a in ?? ()
  11   LWP 246714        0x00007f216bbd8de2 in ?? ()
  10   LWP 246706        0x00007f216bbd8a35 in ?? ()
  9    LWP 209289        0x00007f216ae06eb3 in ?? ()
  8    LWP 246722        0x00007f216bbd8de2 in ?? ()
  7    LWP 246721        0x00007f216bbd8de2 in ?? ()
  6    LWP 246718        0x00007f216bbd8de2 in ?? ()
  5    LWP 246717        0x00007f216bbd8de2 in ?? ()
  4    LWP 246719        0x00007f216bbd8de2 in ?? ()
  3    LWP 246713        0x00007f216bbd8de2 in ?? ()
  2    LWP 246712        0x00007f216adfd9a3 in ?? ()
* 1    LWP 246711        0x0000000000482910 in loadAppendOnlyFile (filename=0x4000000000000000 <Address 0x4000000000000000 out of bounds>) at aof.c:883
(gdb) bt
#0  0x0000000000482910 in loadAppendOnlyFile (filename=0x4000000000000000 <Address 0x4000000000000000 out of bounds>) at aof.c:883
#1  0x00007f215ebd1da0 in ?? ()
#2  0x00007f216bdf74d8 in ?? ()
#3  0x9555a74d00000000 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) 

Comment From: oranagra

@chenyang8094 can you please have a look?

Comment From: ambarsarkar

Apologies for the item #2 in the Additional Information. While related, that core file is NOT from redis-server, but a application of the cpp_redis::client (a c++ client for redis).

It still is not clear why the redis server crash was seen in the original report and would highly appreciate any hints on why the redis-server crashed at the specific instruction.

Comment From: oranagra

@ambarsarkar so did redis at all crash? Can you provide evidence from the log file? Is what you posted is the entire log file? All I can see is that the fork child doing an AOF rewrite had a timeout writing to the disk. But I don't see any crash or error message from Redis's main process.

Comment From: ambarsarkar

Hi @oranagra thanks for responding.

The main redis-server thread did crash.

  1. A corefile was created, and the main server process (61436) has disappeared. The core file is too small unfortunately. -rw------- 1 svc_crystal hardware 1024 Aug 24 16:14 core.61436

  2. You can see the log file dumped by Redis below, It shows the PID (61436) on the top as well as a crash report in the bottom

1436:C 31 May 2022 09:29:26.031 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
61436:C 31 May 2022 09:29:26.031 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=61436, just started
61436:C 31 May 2022 09:29:26.032 # Configuration loaded
61436:M 31 May 2022 09:29:26.034 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
61436:M 31 May 2022 09:29:26.034 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
61436:M 31 May 2022 09:29:26.035 # Current maximum open files is 8192. maxclients has been reduced to 8160 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
61436:M 31 May 2022 09:29:26.036 * monotonic clock: POSIX clock_gettime
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 6.2.6 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6382
 |    `-._   `._    /     _.-'    |     **PID: 61436**
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           https://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

61436:M 31 May 2022 09:29:26.045 # Server initialized
61436:M 31 May 2022 09:29:26.160 * Module 'CrystalUtils' loaded from /home/ip/shared/inf/crystal/1.0/62192159/bin/CrystalUtils.so
61436:M 31 May 2022 09:29:26.168 * Module 'CrystalTransactions' loaded from /home/ip/shared/inf/crystal/1.0/62192159/bin/CrystalTransaction.so
61436:M 31 May 2022 09:29:26.171 * Ready to accept connections
61436:M 31 May 2022 09:51:03.287 * Starting automatic rewriting of AOF on 3356077900% growth
61436:M 31 May 2022 09:51:03.289 * Background append only file rewriting started by pid 68245
61436:M 31 May 2022 09:51:03.618 * AOF rewrite child asks to stop sending diffs.
68245:C 31 May 2022 09:51:03.619 * Parent agreed to stop sending diffs. Finalizing AOF...
68245:C 31 May 2022 09:51:03.619 * Concatenating 0.07 MB of AOF diff received from parent.
68245:C 31 May 2022 09:51:03.622 * SYNC append only file rewrite performed
68245:C 31 May 2022 09:51:03.624 * AOF rewrite: 2 MB of memory used by copy-on-write

**<similar stuff deleted, no warnings or errors>**

06168:C 24 Aug 2022 02:47:06.977 * AOF rewrite: 54245 MB of memory used by copy-on-write
61436:M 24 Aug 2022 02:47:21.482 * Background AOF rewrite terminated with success
61436:M 24 Aug 2022 02:47:21.483 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
61436:M 24 Aug 2022 02:47:21.485 * Background AOF rewrite finished successfully
61436:M 24 Aug 2022 03:38:46.729 * Starting automatic rewriting of AOF on 10% growth
61436:M 24 Aug 2022 03:38:52.930 * Background append only file rewriting started by pid 141211
61436:M 24 Aug 2022 04:29:09.967 * AOF rewrite child asks to stop sending diffs.
141211:C 24 Aug 2022 04:29:09.968 * Parent agreed to stop sending diffs. Finalizing AOF...
141211:C 24 Aug 2022 04:29:09.969 * Concatenating 1746.95 MB of AOF diff received from parent.
141211:C 24 Aug 2022 04:29:16.054 * SYNC append only file rewrite performed
141211:C 24 Aug 2022 04:29:21.422 * AOF rewrite: 43253 MB of memory used by copy-on-write
61436:M 24 Aug 2022 04:29:35.154 * Background AOF rewrite terminated with success
61436:M 24 Aug 2022 04:29:35.204 * Residual parent diff successfully flushed to the rewritten AOF (53.20 MB)
61436:M 24 Aug 2022 04:29:35.272 * Background AOF rewrite finished successfully
61436:M 24 Aug 2022 05:53:09.142 * Starting automatic rewriting of AOF on 10% growth
61436:M 24 Aug 2022 05:53:15.191 * Background append only file rewriting started by pid 191897


=== REDIS BUG REPORT START: Cut & paste starting from here ===
61436:M 24 Aug 2022 06:22:05.613 # Redis 6.2.6 crashed by signal: 7, si_code: 2
61436:M 24 Aug 2022 06:22:05.614 # Accessing address: 0x7f9ad5be2000
**61436:M 24 Aug 2022 06:22:05.614 # Crashed running the instruction at: 0x7f9ad5be1fff**
191897:C 24 Aug 2022 06:47:41.689 # Write error writing append only file on disk: Connection timed out
  1. Fyi, I was able to restart the server from the last dumped aof, and be happy to provide any information from that.

Comment From: chenyang8094

=== REDIS BUG REPORT START: Cut & paste starting from here === 61436:M 24 Aug 2022 06:22:05.613 # Redis 6.2.6 crashed by signal: 7, si_code: 2 61436:M 24 Aug 2022 06:22:05.614 # Accessing address: 0x7f9ad5be2000 61436:M 24 Aug 2022 06:22:05.614 # Crashed running the instruction at: 0x7f9ad5be1fff 191897:C 24 Aug 2022 06:47:41.689 # Write error writing append only file on disk: Connection timed out

I'm curious how the "Connection timed out" error occurs when AOFRW (It seems a tcp connect err)? I need more information.

And the error code is BUS_ADRERR, it maybe jemalloc error (use mmap) ?

/* Codes for SIGBUS */
#if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)
#define BUS_NOOP        0       /* if only I knew... */
#endif
#define BUS_ADRALN      1       /* [XSI] Invalid address alignment */
#define BUS_ADRERR      2       /* [XSI] Nonexistent physical address -NOTIMP */
#define BUS_OBJERR      3       /* [XSI] Object-specific HW error - NOTIMP */

Comment From: oranagra

I wonder why we don't have a proper crash log with stack trace... Since we don't have a stack trace, can you please upload your executable binary or check which function that address resides in..

Comment From: ambarsarkar

This is probably not a Redis issue unless it could have given more diagnostic information or handled the observed exception better.

Appreciate any input you may have.

In our application, we load Redis modules as shared objects (.so). These modules work fine and do not have any known issues. These modules were already loaded in and had been working fine. It had been days since the modules were loaded. The db is very large (~100Gb).

What happened was that the NFS filer mounts these .so objects reside on became suddenly inaccessible. We suspect that may have cause the redis server to quit midstream. The address involved seems to be in the range, though we do not know for sure as the original process is gone and the core file is insignificant.

The puzzling part is that the .so objects were already loaded using module load, so it is not fully clear why the .so objects becoming unavailable after the module was loaded should matter. We are suspecting it may have to do with how this object is actually paged in to memory.

We had also recently switched aof_use_rdb_preamble from no to yes, and wondering if it exposed some path not seen before by our runs.

Do you have any ideas on how the filer being unavailable can cause a crash and share any recommendations?

As for the core being limited to 1K, it's likely due to the default coredumpsize of 1K , which makes sense as we do not wish to leave huge core dumps. We will consider increasing that.

Thanks for looking into this.

Comment From: oranagra

AFAIR linux can execute files from loading them into the file system cache, so a portion of the executable code can be paged out and that can lead to issues if it can't be re-loaded.

however, i don't understand why the crash log is missing, we do see the first line (crashed by signal), but not any other lines, which should have been printed regardless of that code being available (at least the stack trace).

I generally don't recommend using core-dumps on redis, and i don't see why aof_use_rdb_preamble could be related to this.

Comment From: ambarsarkar

Okay, thanks for additional validation on how fs issues can get us here.

Is there anything we can do to help with the missing crash log issue? We agree as well, we do not wish to use core_dumps here in general, and do not think rdb_preamble caused the issue.

Comment From: oranagra

can you try to spin up another redis instance on the same infra and use DEBUG SEGFAULT to see if the crash log is printed?

Comment From: ambarsarkar

Fyi, DEBUG SEGFAULT did print the crash log on a similar infra/setup. So the question remains why it was not dumped in case of the reported exception.

=== REDIS BUG REPORT START: Cut & paste starting from here === 22574:M 29 Aug 2022 07:18:56.785 # Redis 6.2.6 crashed by signal: 11, si_code: 1 22574:M 29 Aug 2022 07:18:56.786 # Accessing address: 0xffffffffffffffff 22574:M 29 Aug 2022 07:18:56.786 # Crashed running the instruction at: 0x48a9c8

------ STACK TRACE ------ EIP: /home/utils/redis-6.2.6/bin/redis-server *:8888(debugCommand+0x1a8)[0x48a9c8]

Backtrace: /lib64/libpthread.so.0(+0xf630)[0x7f7813f3b630] /home/utils/redis-6.2.6/bin/redis-server :8888(debugCommand+0x1a8)[0x48a9c8] /home/utils/redis-6.2.6/bin/redis-server :8888(call+0x9f)[0x43930f] /home/utils/redis-6.2.6/bin/redis-server :8888(processCommand+0x89d)[0x43df3d] /home/utils/redis-6.2.6/bin/redis-server :8888(processCommandAndResetClient+0x24)[0x44ddc4] /home/utils/redis-6.2.6/bin/redis-server :8888(processInputBuffer+0xd5)[0x4522c5] /home/utils/redis-6.2.6/bin/redis-server :8888[0x4de5db] /home/utils/redis-6.2.6/bin/redis-server :8888(aeProcessEvents+0x258)[0x4327d8] /home/utils/redis-6.2.6/bin/redis-server :8888(aeMain+0x1d)[0x432a7d] /home/utils/redis-6.2.6/bin/redis-server :8888(main+0x447)[0x440587] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7813b80555] /home/utils/redis-6.2.6/bin/redis-server :8888[0x42fae9]

------ REGISTERS ------ 22574:M 29 Aug 2022 07:18:56.789 # RAX:0000000000000000 RBX:00007f781372a380 RCX:0000000000000010 RDX:0000000000000010 RDI:00007f66a3bbb8b0 RSI:000000000056ad80 RBP:00007f778b70b220 RSP:00007ffcc7fe9e70 R8 :0000000000000000 R9 :000000000000000a R10:fffffffffffff8ca R11:0011e144f5c50b52 R12:00007f66a3bbb8b3 R13:0000000000000002 R14:0000000000000000 R15:0000000000000000 RIP:000000000048a9c8 EFL:0000000000010246 CSGSFS:0000000000000033 22574:M 29 Aug 2022 07:18:56.790 # (00007ffcc7fe9e7f) -> 00007f7813c25705 22574:M 29 Aug 2022 07:18:56.791 # (00007ffcc7fe9e7e) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.791 # (00007ffcc7fe9e7d) -> 00007ffcc7fea110 22574:M 29 Aug 2022 07:18:56.792 # (00007ffcc7fe9e7c) -> 00007ffcc7fea100 22574:M 29 Aug 2022 07:18:56.793 # (00007ffcc7fe9e7b) -> 00007ffcc7fe9f50 22574:M 29 Aug 2022 07:18:56.794 # (00007ffcc7fe9e7a) -> 00007ffcc7fe9f40 22574:M 29 Aug 2022 07:18:56.794 # (00007ffcc7fe9e79) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.795 # (00007ffcc7fe9e78) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.796 # (00007ffcc7fe9e77) -> 0000000000000001 22574:M 29 Aug 2022 07:18:56.796 # (00007ffcc7fe9e76) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.798 # (00007ffcc7fe9e75) -> 000000000000582e 22574:M 29 Aug 2022 07:18:56.798 # (00007ffcc7fe9e74) -> 00007ffcc7fe9f30 22574:M 29 Aug 2022 07:18:56.799 # (00007ffcc7fe9e73) -> 00007ffcc7fea784 22574:M 29 Aug 2022 07:18:56.800 # (00007ffcc7fe9e72) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.801 # (00007ffcc7fe9e71) -> 00007ffcc7fe9e80 22574:M 29 Aug 2022 07:18:56.801 # (00007ffcc7fe9e70) -> 00007ffcc7fe9f40

------ INFO OUTPUT ------

Server

redis_version:6.2.6

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:b0a4475eea151e9d

redis_mode:standalone

os:Linux 3.10.0-1127.el7.x86_64 x86_64

arch_bits:64

multiplexing_api:epoll

atomicvar_api:sync-builtin

gcc_version:4.4.7

process_id:22574

process_supervised:no

run_id:e6a730ac3a14233ea1d1713db18441a0d2c76184

tcp_port:8888

server_time_usec:1661782736783799

uptime_in_seconds:5385

uptime_in_days:0

hz:10

configured_hz:10

lru_clock:838352

executable:/home/utils/redis-6.2.6/bin/redis-server

config_file:

io_threads_active:0

Clients

connected_clients:1

cluster_connections:0

maxclients:8160

client_recent_max_input_buffer:0

client_recent_max_output_buffer:0

blocked_clients:0

tracking_clients:0

clients_in_timeout_table:0

Memory

used_memory:136693110760

used_memory_human:127.31G

used_memory_rss:139600097280

used_memory_rss_human:130.01G

used_memory_peak:137634707760

used_memory_peak_human:128.18G

used_memory_peak_perc:99.32%

used_memory_overhead:44527328048

used_memory_startup:794432

used_memory_dataset:92165782712

used_memory_dataset_perc:67.43%

allocator_allocated:136693137432

allocator_active:138115117056

allocator_resident:139607891968

total_system_memory:1621650100224

total_system_memory_human:1.47T

used_memory_lua:37888

used_memory_lua_human:37.00K

used_memory_scripts:0

used_memory_scripts_human:0B

number_of_cached_scripts:0

maxmemory:0

maxmemory_human:0B

maxmemory_policy:noeviction

allocator_frag_ratio:1.01

allocator_frag_bytes:1421979624

allocator_rss_ratio:1.01

allocator_rss_bytes:1492774912

rss_overhead_ratio:1.00

rss_overhead_bytes:-7794688

mem_fragmentation_ratio:1.02

mem_fragmentation_bytes:2907050280

mem_not_counted_for_evict:108

mem_replication_backlog:0

mem_clients_slaves:0

mem_clients_normal:0

mem_aof_buffer:112

mem_allocator:jemalloc-5.1.0

active_defrag_running:0

lazyfree_pending_objects:0

lazyfreed_objects:0

Persistence

loading:0

current_cow_size:0

current_cow_size_age:0

current_fork_perc:0.00

current_save_keys_processed:0

current_save_keys_total:0

rdb_changes_since_last_save:1

rdb_bgsave_in_progress:0

rdb_last_save_time:1661781705

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:2723

rdb_current_bgsave_time_sec:-1

rdb_last_cow_size:3362816

aof_enabled:1

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:-1

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

aof_last_cow_size:0

module_fork_in_progress:0

module_fork_last_cow_size:0

aof_current_size:92958209852

aof_base_size:92958209800

aof_pending_rewrite:0

aof_buffer_length:0

aof_rewrite_buffer_length:0

aof_pending_bio_fsync:0

aof_delayed_fsync:0

Stats

total_connections_received:531

total_commands_processed:200709983

instantaneous_ops_per_sec:0

total_net_input_bytes:7464

total_net_output_bytes:10176

instantaneous_input_kbps:0.00

instantaneous_output_kbps:0.00

rejected_connections:0

sync_full:0

sync_partial_ok:0

sync_partial_err:0

expired_keys:0

expired_stale_perc:0.00

expired_time_cap_reached_count:0

expire_cycle_cpu_milliseconds:45

evicted_keys:0

keyspace_hits:0

keyspace_misses:0

pubsub_channels:0

pubsub_patterns:0

latest_fork_usec:3245448

total_forks:1

migrate_cached_sockets:0

slave_expires_tracked_keys:0

active_defrag_hits:0

active_defrag_misses:0

active_defrag_key_hits:0

active_defrag_key_misses:0

tracking_total_keys:0

tracking_total_items:0

tracking_total_prefixes:0

unexpected_error_replies:0

total_error_replies:154

dump_payload_sanitizations:0

total_reads_processed:1061

total_writes_processed:530

io_threaded_reads_processed:0

io_threaded_writes_processed:0

Replication

role:master

connected_slaves:0

master_failover_state:no-failover

master_replid:bbdd5eb7f1194f432407612743765299b4e35739

master_replid2:0000000000000000000000000000000000000000

master_repl_offset:0

second_repl_offset:-1

repl_backlog_active:0

repl_backlog_size:1048576

repl_backlog_first_byte_offset:0

repl_backlog_histlen:0

CPU

used_cpu_sys:237.454246

used_cpu_user:1393.248767

used_cpu_sys_children:939.101560

used_cpu_user_children:1694.724304

used_cpu_sys_main_thread:237.414281

used_cpu_user_main_thread:1393.241446

Modules

module:name=CrystalTransactions,ver=1,api=1,filters=0,usedby=[],using=[],options=[]

Commandstats

cmdstat_set:calls=1,usec=6,usec_per_call=6.00,rejected_calls=0,failed_calls=0

cmdstat_hset:calls=2,usec=29,usec_per_call=14.50,rejected_calls=0,failed_calls=0

cmdstat_ping:calls=375,usec=5084,usec_per_call=13.56,rejected_calls=154,failed_calls=0

Errorstats

errorstat_LOADING:count=154

Cluster

cluster_enabled:0

Keyspace

db0:keys=898414972,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------ id=534 addr=10.128.65.138:47370 laddr=10.128.65.138:8888 fd=9 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=29 qbuf-free=40925 argv-mem=13 obl=0 oll=0 omem=0 tot-mem=61469 events=r cmd=debug user=default redir=-1

------ CURRENT CLIENT INFO ------ id=534 addr=10.128.65.138:47370 laddr=10.128.65.138:8888 fd=9 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=29 qbuf-free=40925 argv-mem=13 obl=0 oll=0 omem=0 tot-mem=61469 events=r cmd=debug user=default redir=-1 argv[0]: 'debug' argv[1]: 'segfault'

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------ 22574:M 29 Aug 2022 07:18:56.809 # Bio thread for job type #0 terminated 22574:M 29 Aug 2022 07:18:56.810 # Bio thread for job type #1 terminated 22574:M 29 Aug 2022 07:18:56.810 # Bio thread for job type #2 terminated *** Preparing to test memory region 7cb000 (2281472 bytes) *** Preparing to test memory region 2902000 (266240 bytes) *** Preparing to test memory region 2943000 (262144 bytes) *** Preparing to test memory region 7f5098000000 (135168 bytes) *** Preparing to test memory region 7f509efc6000 (33792000 bytes) *** Preparing to test memory region 7f50a1000000 (2097152 bytes) *** Preparing to test memory region 7f50a1478000 (8192 bytes) *** Preparing to test memory region 7f50a1899000 (8192 bytes) *** Preparing to test memory region 7f50a1cb4000 (24576 bytes) *** Preparing to test memory region 7f50a232f000 (16384 bytes) *** Preparing to test memory region 7f50a344f000 (12288 bytes) *** Preparing to test memory region 7f50a3fe0000 (28672 bytes) *** Preparing to test memory region 7f50a47a6000 (40960 bytes) *** Preparing to test memory region 7f50a53ff000 (4096 bytes) *** Preparing to test memory region 7f50a5400000 (158464999424 bytes) *** Preparing to test memory region 7f758a9c5000 (10737418240 bytes) *** Preparing to test memory region 7f780a9c6000 (33792000 bytes) *** Preparing to test memory region 7f780ca00000 (6291456 bytes) *** Preparing to test memory region 7f780d150000 (33792000 bytes) *** Preparing to test memory region 7f7813200000 (8388608 bytes) *** Preparing to test memory region 7f7813f27000 (20480 bytes) *** Preparing to test memory region 7f7814144000 (16384 bytes) *** Preparing to test memory region 7f78145e4000 (16384 bytes) *** Preparing to test memory region 7f7815166000 (28672 bytes) *** Preparing to test memory region 7f781519f000 (4096 bytes) *** Preparing to test memory region 7f78151a0000 (4096 bytes) *** Preparing to test memory region 7f78151a3000 (4096 bytes) .O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------ Symbol: debugCommand (base: 0x48a820) Module: /home/utils/redis-6.2.6/bin/redis-server *:8888 (base 0x400000) $ xxd -r -p /tmp/dump.hex /tmp/dump.bin $ objdump --adjust-vma=0x48a820 -D -b binary -m i386:x86-64 /tmp/dump.bin


22574:M 29 Aug 2022 07:27:22.512 # dump of function (hexdump of 552 bytes): 48895c24d048896c24d84889fb4c896424e04c896c24e84c897424f04c897c24f84881ec38110000448b6f484183fd020f8482010000488b6f50488b45084c8b6008be89ad56004c89e7e8a149faff85c00f8451010000be4b7c56004c89e7e88c49faff85c00f84e9110000be92ad56004c89e7e87749faff85c00f85870000004531e4837b480248c78424f8100000000000000f8f0e020000488b4508be92ad5600488b7808e84449faff83f8014c89e619ff83e703e8841dfbffbef0b856004889dfe87762fcff488b9c2408110000488bac24101100004c8ba424181100004c8bac24201100004c8bb424281100004c8bbc24301100004881c438110000c30f1f8000000000be9aad56004c89e7e8db48faff85c00f8464ffffffbee89a57004c89e7e8c648faff85c00f84ee020000beacad56004c89e7e8b148faff85c00f848c140000be598855004c89e7e89c48faff85c0750a4183fd030f840e020000be679458004c89e7e88148faff85c00f85890000004183fd030f857f000000488b4510488b7808e8926afbff6690488b35117b34004889dfe8c17bfcffe925ffffff0f1f4000c60425ffffffff78e914ffffff0f1f00488b6f50be7a7f5500488b45084c8b60084c89e7e81f48faff85c00f8569feffff488d7c2440bec0ca5600b94f000000f348a5488d7424404889dfe8285dfcffe9ccfeffff0f1f00bec8ad56004c89e7e8e347faff85c00f85cb0000004531e44183fd02c7442430010000000f8e8605 === REDIS BUG REPORT END. Make sure to include from START to END. ===

   Please report the crash by opening an issue on github:

       http://github.com/redis/redis/issues

Suspect RAM error? Use redis-server --test-memory to verify it.

Comment From: oranagra

Ok, so the platform and build are ok, but there was something specific with that incident that made the process die before printing the stack trace. If you still have access to the kernel's syslog, maybe there's a hint there.

Comment From: ambarsarkar

All we see in /var/log/messages is Aug 24 06:22:15 abrt-server: Executable '/home/utils/redis-6.2.6/bin/redis-server' doesn't belong to any package and ProcessUnpackaged is set to 'no' Aug 26 11:12:06 abrt-hook-ccpp: Process 269390 (redis-server) of user 28745 killed by SIGBUS - dumping core Aug 26 11:12:10 abrt-server: Executable '/home/utils/redis-6.2.6/bin/redis-server' doesn't belong to any package and ProcessUnpackaged is set to 'no'

Note that the core-dump was not complete due to default limit of only 1K size. Is it possible that if the core-dump fails, the stack trace is not printed? The stack trace would definitely have been handy. Thanks

Comment From: oranagra

No, this is unrelated. First, redis catches the signal, and prints its crash report, and then it throws the signal again, which generated a core file. In this case I guess redis generated a sigbus while trying to print the stack trace. I have no idea why. Maybe @yossigo can think of something.

Comment From: yossigo

If I understand correctly, we're dealing with modules loaded from NFS which may have their text pages unavailable. I believe this can easily explain SIGBUS trying to access them.

Comment From: oranagra

@yossigo yes, but why would it SIGBUG again when we print the stack trace? ohh, maybe calling backtrace_symbols_fd

Comment From: ambarsarkar

So do we now know why the stack trace was not printed and can anything be done to make it print (handle exception etc)? Also, will the behavior be any different with the latest 7.x versions? Thanks

Comment From: oranagra

Yes, when we print the stack trace, other than getting the addresses of the functions on the stack, we also need to translate them to symbols (function names), and that part probably crashed on SIGBUS too since the ELF is inaccessible.

In theory, we can catch that, and resort to printing just the addresses, but it won't be very useful and considering this case is rare, i don't think we want to do that.

in think the outcome of this case would have been the same on any version of redis. I think we should close this case without any modification to redis.