I can reproduce this segfault on different hardware (so it's not a hardware issue). This is happening on a follower.
Redis 3.2.11 crashed by signal: 11
Crashed running the instuction at: 0x56360a13d174
Accessing address: 0x50
Failed assertion: <no assertion failed> (<no file>:0)
------ STACK TRACE ------
------ INFO OUTPUT ------
# Server
redis_version:3.2.11
redis_git_sha1:1cb6effd
redis_git_dirty:0
redis_build_id:b2733c6f5fd2d2e5
redis_mode:standalone
os:Linux 4.4.0-98-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.4
process_id:9
run_id:c8ed4fb0b1c6dead412ca8a8f24eb51bc03aa983
tcp_port:6379
uptime_in_seconds:0
uptime_in_days:0
hz:10
lru_clock:738290
executable:/usr/bin/redis-server
config_file:/etc/redis/redis.conf
# Clients
connected_clients:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:427796888
used_memory_human:407.98M
used_memory_rss:0
used_memory_rss_human:0B
used_memory_peak:427796888
used_memory_peak_human:407.98M
total_system_memory:16040189952
total_system_memory_human:14.94G
used_memory_lua:54272
used_memory_lua_human:53.00K
maxmemory:7516192768
maxmemory_human:7.00G
maxmemory_policy:noeviction
mem_fragmentation_ratio:0.00
mem_allocator:jemalloc-4.0.3
# Persistence
loading:1
rdb_changes_since_last_save:2361054
rdb_bgsave_in_progress:0
rdb_last_save_time:1510687730
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
loading_start_time:1510687730
loading_total_bytes:410452310
loading_loaded_bytes:400706923
loading_loaded_perc:97.63
loading_eta_seconds:0
# Stats
total_connections_received:6
total_commands_processed:39368
instantaneous_ops_per_sec:0
total_net_input_bytes:888
total_net_output_bytes:354
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:14285
keyspace_misses:14
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
# Replication
role:slave
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1510687730
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.35
used_cpu_user:8.88
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
# Commandstats
cmdstat_auth:calls=6,usec=15,usec_per_call=2.50
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=1171429,expires=21,avg_ttl=0
hash_init_value: 1510972735
------ CLIENT LIST OUTPUT ------
------ REGISTERS ------
RAX:0000000000000000 RBX:0000000000000000
RCX:0000015fbc01a222 RDX:0000000000000000
RDI:0000000000000000 RSI:00007fcafa5a88f0
RBP:00007fcafa5a88f0 RSP:00007ffe71eedf00
R8 :00007fcafa5a85a0 R9 :00000000fffffff9
R10:0000000000000000 R11:0000000000000001
R12:0000000000000000 R13:00007fcb11ab52a0
R14:0000000000000001 R15:0000000000000000
RIP:000056360a13d174 EFL:0000000000010206
CSGSFS:0000000000000033
(00007ffe71eedf0f) -> 0000000000000000
(00007ffe71eedf0e) -> 00007fcb157bc640
(00007ffe71eedf0d) -> 000056360a16304a
(00007ffe71eedf0c) -> 00007fcb11ab52b0
(00007ffe71eedf0b) -> 00007fcb157bc640
(00007ffe71eedf0a) -> 00007fcb11ab52a8
(00007ffe71eedf09) -> 000056360a13d33f
(00007ffe71eedf08) -> 00007fcb15616871
(00007ffe71eedf07) -> 0000000000000001
(00007ffe71eedf06) -> 00007fcb11ab52a0
(00007ffe71eedf05) -> 0000000000000000
(00007ffe71eedf04) -> 00007fcb157bc640
(00007ffe71eedf03) -> 0000000000000000
(00007ffe71eedf02) -> 00007fcafa5a88f0
(00007ffe71eedf01) -> 000056360a13ce10
(00007ffe71eedf00) -> 00007fcafa5a88f0
------ FAST MEMORY TEST ------
Bio thread for job type #0 terminated
Bio thread for job type #1 terminated
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: dictAddRaw (base: 0x56360a13d160)
Module: /usr/bin/redis-server *:6379 (base 0x56360a111000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x56360a13d160 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
dump of function (hexdump of 148 bytes):
41574989ff415641554154554889f5534883ec1848837f50ff0f84960000008b
575885d2747a498b074889ef4d89fcff104189c6498d47404889442408458b6c
2420498b4424104521f54489ea488b1cd04885db7530e9ad0000000f1f440000
498b07488b40184885c0740d498b7f084889eeffd085c07515488b5b104885db
0f8482000000488b134839d575d24883c41831c0
Comment From: cyberdelia
I can also replicate it on different hardware and a different dataset.
Comment From: cyberdelia
It appears that the follower crashes, and when it restarts find corrupted AOF which leads to this segfault. The segfault seems to be due to a BRPOPLPUSH command that appears to block while replaying the AOF, which isn't allowed.
I've upgraded this standby to 4.0.8, and the issue is still present as the standby still receive "invalid" AOF, with a blocking BRPOPLPUSH command:
Closing client that reached max query buffer length: id=680 addr=10.0.0.1:6379 fd=9 name= age=255 idle=0 flags=Mb db=0 sub=0 psub=0 multi=-1 qbuf=1073757808 qbuf-free=311296 obl=0 oll=0 omem=0 events=r cmd=brpoplpush (qbuf initial bytes: "*18\r\n$7\r\nevalsha\r\n$40\r\nbf0edc581619a27a39587a4a5acab6d636c5d607\r")
Connection with master lost.
Comment From: itamarhaber
That's a fine piece of tracing you've done @cyberdelia - sounds like a nasty issue indeed, should be easy to reproduce.
Comment From: cyberdelia
I found another follower suffering from the similar issue, both services uses the bull library.
It looks like the LUA scripts are being executed on the follower but not within the same "logical content" as the primary, this is seems to be confirmed by the fact that I haven't been able to reproduce the issue by issuing direct commands to Redis.