Crash report
Paste the complete crash log between the quotes below. Please include a few lines from the log preceding the crash report to provide some context.
1436:M 24 Aug 2022 06:22:05.613 # Redis 6.2.6 crashed by signal: 7, si_code: 2
61436:M 24 Aug 2022 06:22:05.614 # Accessing address: 0x7f9ad5be2000
61436:M 24 Aug 2022 06:22:05.614 # Crashed running the instruction at: 0x7f9ad5be1fff
191897:C 24 Aug 2022 06:47:41.689 # Write error writing append only file on disk: Connection timed out
Additional information
1. OS distribution and version
uname -a
Linux dc6-cdb-001 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
2. core file was dumped. adding the debugger to the core file, it seems the crash occurred in loadAppendOnl;yFile, which seems to get the wrong file name
Core was generated by `/home/ip/shared/inf/crystal/1.0/63730111/bin/crs --lua_path /home/ip/shared/inf'.
Program terminated with signal 7, Bus error.
#0 0x0000000000482910 in loadAppendOnlyFile (filename=0x4000000000000000 <Address 0x4000000000000000 out of bounds>) at aof.c:883
883 aof.c: No such file or directory.
(gdb) info threads
Id Target Id Frame
18 LWP 246709 0x00007f216ae06eb3 in ?? ()
17 LWP 246716 0x00007f216bbd8de2 in ?? ()
16 LWP 246720 0x00007f216bbd8de2 in ?? ()
15 LWP 246715 0x00007f216bbd8de2 in ?? ()
14 LWP 246707 0x00007f216bbd8a35 in ?? ()
13 LWP 246708 0x00007f216bbd8a35 in ?? ()
12 LWP 246705 0x00007f216ad3f58a in ?? ()
11 LWP 246714 0x00007f216bbd8de2 in ?? ()
10 LWP 246706 0x00007f216bbd8a35 in ?? ()
9 LWP 209289 0x00007f216ae06eb3 in ?? ()
8 LWP 246722 0x00007f216bbd8de2 in ?? ()
7 LWP 246721 0x00007f216bbd8de2 in ?? ()
6 LWP 246718 0x00007f216bbd8de2 in ?? ()
5 LWP 246717 0x00007f216bbd8de2 in ?? ()
4 LWP 246719 0x00007f216bbd8de2 in ?? ()
3 LWP 246713 0x00007f216bbd8de2 in ?? ()
2 LWP 246712 0x00007f216adfd9a3 in ?? ()
* 1 LWP 246711 0x0000000000482910 in loadAppendOnlyFile (filename=0x4000000000000000 <Address 0x4000000000000000 out of bounds>) at aof.c:883
(gdb) bt
#0 0x0000000000482910 in loadAppendOnlyFile (filename=0x4000000000000000 <Address 0x4000000000000000 out of bounds>) at aof.c:883
#1 0x00007f215ebd1da0 in ?? ()
#2 0x00007f216bdf74d8 in ?? ()
#3 0x9555a74d00000000 in ?? ()
#4 0x0000000000000000 in ?? ()
(gdb)
Comment From: oranagra
@chenyang8094 can you please have a look?
Comment From: ambarsarkar
Apologies for the item #2 in the Additional Information. While related, that core file is NOT from redis-server, but a application of the cpp_redis::client (a c++ client for redis).
It still is not clear why the redis server crash was seen in the original report and would highly appreciate any hints on why the redis-server crashed at the specific instruction.
Comment From: oranagra
@ambarsarkar so did redis at all crash? Can you provide evidence from the log file? Is what you posted is the entire log file? All I can see is that the fork child doing an AOF rewrite had a timeout writing to the disk. But I don't see any crash or error message from Redis's main process.
Comment From: ambarsarkar
Hi @oranagra thanks for responding.
The main redis-server thread did crash.
-
A corefile was created, and the main server process (61436) has disappeared. The core file is too small unfortunately. -rw------- 1 svc_crystal hardware 1024 Aug 24 16:14 core.61436
-
You can see the log file dumped by Redis below, It shows the PID (61436) on the top as well as a crash report in the bottom
1436:C 31 May 2022 09:29:26.031 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
61436:C 31 May 2022 09:29:26.031 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=61436, just started
61436:C 31 May 2022 09:29:26.032 # Configuration loaded
61436:M 31 May 2022 09:29:26.034 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
61436:M 31 May 2022 09:29:26.034 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
61436:M 31 May 2022 09:29:26.035 # Current maximum open files is 8192. maxclients has been reduced to 8160 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
61436:M 31 May 2022 09:29:26.036 * monotonic clock: POSIX clock_gettime
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.2.6 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6382
| `-._ `._ / _.-' | **PID: 61436**
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | https://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
61436:M 31 May 2022 09:29:26.045 # Server initialized
61436:M 31 May 2022 09:29:26.160 * Module 'CrystalUtils' loaded from /home/ip/shared/inf/crystal/1.0/62192159/bin/CrystalUtils.so
61436:M 31 May 2022 09:29:26.168 * Module 'CrystalTransactions' loaded from /home/ip/shared/inf/crystal/1.0/62192159/bin/CrystalTransaction.so
61436:M 31 May 2022 09:29:26.171 * Ready to accept connections
61436:M 31 May 2022 09:51:03.287 * Starting automatic rewriting of AOF on 3356077900% growth
61436:M 31 May 2022 09:51:03.289 * Background append only file rewriting started by pid 68245
61436:M 31 May 2022 09:51:03.618 * AOF rewrite child asks to stop sending diffs.
68245:C 31 May 2022 09:51:03.619 * Parent agreed to stop sending diffs. Finalizing AOF...
68245:C 31 May 2022 09:51:03.619 * Concatenating 0.07 MB of AOF diff received from parent.
68245:C 31 May 2022 09:51:03.622 * SYNC append only file rewrite performed
68245:C 31 May 2022 09:51:03.624 * AOF rewrite: 2 MB of memory used by copy-on-write
**<similar stuff deleted, no warnings or errors>**
06168:C 24 Aug 2022 02:47:06.977 * AOF rewrite: 54245 MB of memory used by copy-on-write
61436:M 24 Aug 2022 02:47:21.482 * Background AOF rewrite terminated with success
61436:M 24 Aug 2022 02:47:21.483 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
61436:M 24 Aug 2022 02:47:21.485 * Background AOF rewrite finished successfully
61436:M 24 Aug 2022 03:38:46.729 * Starting automatic rewriting of AOF on 10% growth
61436:M 24 Aug 2022 03:38:52.930 * Background append only file rewriting started by pid 141211
61436:M 24 Aug 2022 04:29:09.967 * AOF rewrite child asks to stop sending diffs.
141211:C 24 Aug 2022 04:29:09.968 * Parent agreed to stop sending diffs. Finalizing AOF...
141211:C 24 Aug 2022 04:29:09.969 * Concatenating 1746.95 MB of AOF diff received from parent.
141211:C 24 Aug 2022 04:29:16.054 * SYNC append only file rewrite performed
141211:C 24 Aug 2022 04:29:21.422 * AOF rewrite: 43253 MB of memory used by copy-on-write
61436:M 24 Aug 2022 04:29:35.154 * Background AOF rewrite terminated with success
61436:M 24 Aug 2022 04:29:35.204 * Residual parent diff successfully flushed to the rewritten AOF (53.20 MB)
61436:M 24 Aug 2022 04:29:35.272 * Background AOF rewrite finished successfully
61436:M 24 Aug 2022 05:53:09.142 * Starting automatic rewriting of AOF on 10% growth
61436:M 24 Aug 2022 05:53:15.191 * Background append only file rewriting started by pid 191897
=== REDIS BUG REPORT START: Cut & paste starting from here ===
61436:M 24 Aug 2022 06:22:05.613 # Redis 6.2.6 crashed by signal: 7, si_code: 2
61436:M 24 Aug 2022 06:22:05.614 # Accessing address: 0x7f9ad5be2000
**61436:M 24 Aug 2022 06:22:05.614 # Crashed running the instruction at: 0x7f9ad5be1fff**
191897:C 24 Aug 2022 06:47:41.689 # Write error writing append only file on disk: Connection timed out
- Fyi, I was able to restart the server from the last dumped aof, and be happy to provide any information from that.
Comment From: chenyang8094
=== REDIS BUG REPORT START: Cut & paste starting from here === 61436:M 24 Aug 2022 06:22:05.613 # Redis 6.2.6 crashed by signal: 7, si_code: 2 61436:M 24 Aug 2022 06:22:05.614 # Accessing address: 0x7f9ad5be2000 61436:M 24 Aug 2022 06:22:05.614 # Crashed running the instruction at: 0x7f9ad5be1fff 191897:C 24 Aug 2022 06:47:41.689 # Write error writing append only file on disk: Connection timed out
I'm curious how the "Connection timed out" error occurs when AOFRW (It seems a tcp connect err)? I need more information.
And the error code is BUS_ADRERR, it maybe jemalloc error (use mmap) ?
/* Codes for SIGBUS */
#if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)
#define BUS_NOOP 0 /* if only I knew... */
#endif
#define BUS_ADRALN 1 /* [XSI] Invalid address alignment */
#define BUS_ADRERR 2 /* [XSI] Nonexistent physical address -NOTIMP */
#define BUS_OBJERR 3 /* [XSI] Object-specific HW error - NOTIMP */
Comment From: oranagra
I wonder why we don't have a proper crash log with stack trace... Since we don't have a stack trace, can you please upload your executable binary or check which function that address resides in..
Comment From: ambarsarkar
This is probably not a Redis issue unless it could have given more diagnostic information or handled the observed exception better.
Appreciate any input you may have.
In our application, we load Redis modules as shared objects (.so). These modules work fine and do not have any known issues. These modules were already loaded in and had been working fine. It had been days since the modules were loaded. The db is very large (~100Gb).
What happened was that the NFS filer mounts these .so objects reside on became suddenly inaccessible. We suspect that may have cause the redis server to quit midstream. The address involved seems to be in the range, though we do not know for sure as the original process is gone and the core file is insignificant.
The puzzling part is that the .so objects were already loaded using module load, so it is not fully clear why the .so objects becoming unavailable after the module was loaded should matter. We are suspecting it may have to do with how this object is actually paged in to memory.
We had also recently switched aof_use_rdb_preamble from no to yes, and wondering if it exposed some path not seen before by our runs.
Do you have any ideas on how the filer being unavailable can cause a crash and share any recommendations?
As for the core being limited to 1K, it's likely due to the default coredumpsize of 1K , which makes sense as we do not wish to leave huge core dumps. We will consider increasing that.
Thanks for looking into this.
Comment From: oranagra
AFAIR linux can execute files from loading them into the file system cache, so a portion of the executable code can be paged out and that can lead to issues if it can't be re-loaded.
however, i don't understand why the crash log is missing, we do see the first line (crashed by signal), but not any other lines, which should have been printed regardless of that code being available (at least the stack trace).
I generally don't recommend using core-dumps on redis, and i don't see why aof_use_rdb_preamble could be related to this.
Comment From: ambarsarkar
Okay, thanks for additional validation on how fs issues can get us here.
Is there anything we can do to help with the missing crash log issue? We agree as well, we do not wish to use core_dumps here in general, and do not think rdb_preamble caused the issue.
Comment From: oranagra
can you try to spin up another redis instance on the same infra and use DEBUG SEGFAULT to see if the crash log is printed?
Comment From: ambarsarkar
Fyi, DEBUG SEGFAULT did print the crash log on a similar infra/setup. So the question remains why it was not dumped in case of the reported exception.
=== REDIS BUG REPORT START: Cut & paste starting from here === 22574:M 29 Aug 2022 07:18:56.785 # Redis 6.2.6 crashed by signal: 11, si_code: 1 22574:M 29 Aug 2022 07:18:56.786 # Accessing address: 0xffffffffffffffff 22574:M 29 Aug 2022 07:18:56.786 # Crashed running the instruction at: 0x48a9c8
------ STACK TRACE ------ EIP: /home/utils/redis-6.2.6/bin/redis-server *:8888(debugCommand+0x1a8)[0x48a9c8]
Backtrace: /lib64/libpthread.so.0(+0xf630)[0x7f7813f3b630] /home/utils/redis-6.2.6/bin/redis-server :8888(debugCommand+0x1a8)[0x48a9c8] /home/utils/redis-6.2.6/bin/redis-server :8888(call+0x9f)[0x43930f] /home/utils/redis-6.2.6/bin/redis-server :8888(processCommand+0x89d)[0x43df3d] /home/utils/redis-6.2.6/bin/redis-server :8888(processCommandAndResetClient+0x24)[0x44ddc4] /home/utils/redis-6.2.6/bin/redis-server :8888(processInputBuffer+0xd5)[0x4522c5] /home/utils/redis-6.2.6/bin/redis-server :8888[0x4de5db] /home/utils/redis-6.2.6/bin/redis-server :8888(aeProcessEvents+0x258)[0x4327d8] /home/utils/redis-6.2.6/bin/redis-server :8888(aeMain+0x1d)[0x432a7d] /home/utils/redis-6.2.6/bin/redis-server :8888(main+0x447)[0x440587] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7813b80555] /home/utils/redis-6.2.6/bin/redis-server :8888[0x42fae9]
------ REGISTERS ------ 22574:M 29 Aug 2022 07:18:56.789 # RAX:0000000000000000 RBX:00007f781372a380 RCX:0000000000000010 RDX:0000000000000010 RDI:00007f66a3bbb8b0 RSI:000000000056ad80 RBP:00007f778b70b220 RSP:00007ffcc7fe9e70 R8 :0000000000000000 R9 :000000000000000a R10:fffffffffffff8ca R11:0011e144f5c50b52 R12:00007f66a3bbb8b3 R13:0000000000000002 R14:0000000000000000 R15:0000000000000000 RIP:000000000048a9c8 EFL:0000000000010246 CSGSFS:0000000000000033 22574:M 29 Aug 2022 07:18:56.790 # (00007ffcc7fe9e7f) -> 00007f7813c25705 22574:M 29 Aug 2022 07:18:56.791 # (00007ffcc7fe9e7e) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.791 # (00007ffcc7fe9e7d) -> 00007ffcc7fea110 22574:M 29 Aug 2022 07:18:56.792 # (00007ffcc7fe9e7c) -> 00007ffcc7fea100 22574:M 29 Aug 2022 07:18:56.793 # (00007ffcc7fe9e7b) -> 00007ffcc7fe9f50 22574:M 29 Aug 2022 07:18:56.794 # (00007ffcc7fe9e7a) -> 00007ffcc7fe9f40 22574:M 29 Aug 2022 07:18:56.794 # (00007ffcc7fe9e79) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.795 # (00007ffcc7fe9e78) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.796 # (00007ffcc7fe9e77) -> 0000000000000001 22574:M 29 Aug 2022 07:18:56.796 # (00007ffcc7fe9e76) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.798 # (00007ffcc7fe9e75) -> 000000000000582e 22574:M 29 Aug 2022 07:18:56.798 # (00007ffcc7fe9e74) -> 00007ffcc7fe9f30 22574:M 29 Aug 2022 07:18:56.799 # (00007ffcc7fe9e73) -> 00007ffcc7fea784 22574:M 29 Aug 2022 07:18:56.800 # (00007ffcc7fe9e72) -> 0000000000000000 22574:M 29 Aug 2022 07:18:56.801 # (00007ffcc7fe9e71) -> 00007ffcc7fe9e80 22574:M 29 Aug 2022 07:18:56.801 # (00007ffcc7fe9e70) -> 00007ffcc7fe9f40
------ INFO OUTPUT ------
Server
redis_version:6.2.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:b0a4475eea151e9d
redis_mode:standalone
os:Linux 3.10.0-1127.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:sync-builtin
gcc_version:4.4.7
process_id:22574
process_supervised:no
run_id:e6a730ac3a14233ea1d1713db18441a0d2c76184
tcp_port:8888
server_time_usec:1661782736783799
uptime_in_seconds:5385
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:838352
executable:/home/utils/redis-6.2.6/bin/redis-server
config_file:
io_threads_active:0
Clients
connected_clients:1
cluster_connections:0
maxclients:8160
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
Memory
used_memory:136693110760
used_memory_human:127.31G
used_memory_rss:139600097280
used_memory_rss_human:130.01G
used_memory_peak:137634707760
used_memory_peak_human:128.18G
used_memory_peak_perc:99.32%
used_memory_overhead:44527328048
used_memory_startup:794432
used_memory_dataset:92165782712
used_memory_dataset_perc:67.43%
allocator_allocated:136693137432
allocator_active:138115117056
allocator_resident:139607891968
total_system_memory:1621650100224
total_system_memory_human:1.47T
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.01
allocator_frag_bytes:1421979624
allocator_rss_ratio:1.01
allocator_rss_bytes:1492774912
rss_overhead_ratio:1.00
rss_overhead_bytes:-7794688
mem_fragmentation_ratio:1.02
mem_fragmentation_bytes:2907050280
mem_not_counted_for_evict:108
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:112
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:1
rdb_bgsave_in_progress:0
rdb_last_save_time:1661781705
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:2723
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:3362816
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:92958209852
aof_base_size:92958209800
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
Stats
total_connections_received:531
total_commands_processed:200709983
instantaneous_ops_per_sec:0
total_net_input_bytes:7464
total_net_output_bytes:10176
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:45
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:3245448
total_forks:1
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:154
dump_payload_sanitizations:0
total_reads_processed:1061
total_writes_processed:530
io_threaded_reads_processed:0
io_threaded_writes_processed:0
Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:bbdd5eb7f1194f432407612743765299b4e35739
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
CPU
used_cpu_sys:237.454246
used_cpu_user:1393.248767
used_cpu_sys_children:939.101560
used_cpu_user_children:1694.724304
used_cpu_sys_main_thread:237.414281
used_cpu_user_main_thread:1393.241446
Modules
module:name=CrystalTransactions,ver=1,api=1,filters=0,usedby=[],using=[],options=[]
Commandstats
cmdstat_set:calls=1,usec=6,usec_per_call=6.00,rejected_calls=0,failed_calls=0
cmdstat_hset:calls=2,usec=29,usec_per_call=14.50,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=375,usec=5084,usec_per_call=13.56,rejected_calls=154,failed_calls=0
Errorstats
errorstat_LOADING:count=154
Cluster
cluster_enabled:0
Keyspace
db0:keys=898414972,expires=0,avg_ttl=0
------ CLIENT LIST OUTPUT ------ id=534 addr=10.128.65.138:47370 laddr=10.128.65.138:8888 fd=9 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=29 qbuf-free=40925 argv-mem=13 obl=0 oll=0 omem=0 tot-mem=61469 events=r cmd=debug user=default redir=-1
------ CURRENT CLIENT INFO ------ id=534 addr=10.128.65.138:47370 laddr=10.128.65.138:8888 fd=9 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=29 qbuf-free=40925 argv-mem=13 obl=0 oll=0 omem=0 tot-mem=61469 events=r cmd=debug user=default redir=-1 argv[0]: 'debug' argv[1]: 'segfault'
------ MODULES INFO OUTPUT ------
------ FAST MEMORY TEST ------ 22574:M 29 Aug 2022 07:18:56.809 # Bio thread for job type #0 terminated 22574:M 29 Aug 2022 07:18:56.810 # Bio thread for job type #1 terminated 22574:M 29 Aug 2022 07:18:56.810 # Bio thread for job type #2 terminated *** Preparing to test memory region 7cb000 (2281472 bytes) *** Preparing to test memory region 2902000 (266240 bytes) *** Preparing to test memory region 2943000 (262144 bytes) *** Preparing to test memory region 7f5098000000 (135168 bytes) *** Preparing to test memory region 7f509efc6000 (33792000 bytes) *** Preparing to test memory region 7f50a1000000 (2097152 bytes) *** Preparing to test memory region 7f50a1478000 (8192 bytes) *** Preparing to test memory region 7f50a1899000 (8192 bytes) *** Preparing to test memory region 7f50a1cb4000 (24576 bytes) *** Preparing to test memory region 7f50a232f000 (16384 bytes) *** Preparing to test memory region 7f50a344f000 (12288 bytes) *** Preparing to test memory region 7f50a3fe0000 (28672 bytes) *** Preparing to test memory region 7f50a47a6000 (40960 bytes) *** Preparing to test memory region 7f50a53ff000 (4096 bytes) *** Preparing to test memory region 7f50a5400000 (158464999424 bytes) *** Preparing to test memory region 7f758a9c5000 (10737418240 bytes) *** Preparing to test memory region 7f780a9c6000 (33792000 bytes) *** Preparing to test memory region 7f780ca00000 (6291456 bytes) *** Preparing to test memory region 7f780d150000 (33792000 bytes) *** Preparing to test memory region 7f7813200000 (8388608 bytes) *** Preparing to test memory region 7f7813f27000 (20480 bytes) *** Preparing to test memory region 7f7814144000 (16384 bytes) *** Preparing to test memory region 7f78145e4000 (16384 bytes) *** Preparing to test memory region 7f7815166000 (28672 bytes) *** Preparing to test memory region 7f781519f000 (4096 bytes) *** Preparing to test memory region 7f78151a0000 (4096 bytes) *** Preparing to test memory region 7f78151a3000 (4096 bytes) .O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------ Symbol: debugCommand (base: 0x48a820) Module: /home/utils/redis-6.2.6/bin/redis-server *:8888 (base 0x400000) $ xxd -r -p /tmp/dump.hex /tmp/dump.bin $ objdump --adjust-vma=0x48a820 -D -b binary -m i386:x86-64 /tmp/dump.bin
22574:M 29 Aug 2022 07:27:22.512 # dump of function (hexdump of 552 bytes): 48895c24d048896c24d84889fb4c896424e04c896c24e84c897424f04c897c24f84881ec38110000448b6f484183fd020f8482010000488b6f50488b45084c8b6008be89ad56004c89e7e8a149faff85c00f8451010000be4b7c56004c89e7e88c49faff85c00f84e9110000be92ad56004c89e7e87749faff85c00f85870000004531e4837b480248c78424f8100000000000000f8f0e020000488b4508be92ad5600488b7808e84449faff83f8014c89e619ff83e703e8841dfbffbef0b856004889dfe87762fcff488b9c2408110000488bac24101100004c8ba424181100004c8bac24201100004c8bb424281100004c8bbc24301100004881c438110000c30f1f8000000000be9aad56004c89e7e8db48faff85c00f8464ffffffbee89a57004c89e7e8c648faff85c00f84ee020000beacad56004c89e7e8b148faff85c00f848c140000be598855004c89e7e89c48faff85c0750a4183fd030f840e020000be679458004c89e7e88148faff85c00f85890000004183fd030f857f000000488b4510488b7808e8926afbff6690488b35117b34004889dfe8c17bfcffe925ffffff0f1f4000c60425ffffffff78e914ffffff0f1f00488b6f50be7a7f5500488b45084c8b60084c89e7e81f48faff85c00f8569feffff488d7c2440bec0ca5600b94f000000f348a5488d7424404889dfe8285dfcffe9ccfeffff0f1f00bec8ad56004c89e7e8e347faff85c00f85cb0000004531e44183fd02c7442430010000000f8e8605 === REDIS BUG REPORT END. Make sure to include from START to END. ===
Please report the crash by opening an issue on github:
http://github.com/redis/redis/issues
Suspect RAM error? Use redis-server --test-memory to verify it.
Comment From: oranagra
Ok, so the platform and build are ok, but there was something specific with that incident that made the process die before printing the stack trace. If you still have access to the kernel's syslog, maybe there's a hint there.
Comment From: ambarsarkar
All we see in /var/log/messages is
Aug 24 06:22:15
Note that the core-dump was not complete due to default limit of only 1K size. Is it possible that if the core-dump fails, the stack trace is not printed? The stack trace would definitely have been handy. Thanks
Comment From: oranagra
No, this is unrelated. First, redis catches the signal, and prints its crash report, and then it throws the signal again, which generated a core file. In this case I guess redis generated a sigbus while trying to print the stack trace. I have no idea why. Maybe @yossigo can think of something.
Comment From: yossigo
If I understand correctly, we're dealing with modules loaded from NFS which may have their text pages unavailable. I believe this can easily explain SIGBUS trying to access them.
Comment From: oranagra
@yossigo yes, but why would it SIGBUG again when we print the stack trace?
ohh, maybe calling backtrace_symbols_fd
Comment From: ambarsarkar
So do we now know why the stack trace was not printed and can anything be done to make it print (handle exception etc)? Also, will the behavior be any different with the latest 7.x versions? Thanks
Comment From: oranagra
Yes, when we print the stack trace, other than getting the addresses of the functions on the stack, we also need to translate them to symbols (function names), and that part probably crashed on SIGBUS too since the ELF is inaccessible.
In theory, we can catch that, and resort to printing just the addresses, but it won't be very useful and considering this case is rare, i don't think we want to do that.
in think the outcome of this case would have been the same on any version of redis. I think we should close this case without any modification to redis.