I've seen this happen three times on two different servers in the past 2 days: redis-server running on FreeBSD 11.2-RELEASE-p4 segfaults in pthread_getspecific+0xe12 at /lib/libthr.so.3. This is a 3-node cluster running a stable config and nothing has changed recently. (Redis was upgraded from 4.0.10 to 4.0.11 over a month ago)
Memory was tested both before the servers were installed and again after the first segfault and no errors were found
Output from the logfile:
567:M 02 Oct 16:32:25.485 * Background saving started by pid 86633
86633:C 02 Oct 16:33:27.931 * DB saved on disk
567:M 02 Oct 16:33:28.248 * Background saving terminated with success
567:M 02 Oct 16:34:51.284 # Accepting client connection: accept: Software caused connection abort
=== REDIS BUG REPORT START: Cut & paste starting from here ===
567:M 02 Oct 16:34:51.331 # Redis 4.0.11 crashed by signal: 11
567:M 02 Oct 16:34:51.331 # Accessing address: 0x4b1f5a578
567:M 02 Oct 16:34:51.331 # Failed assertion: <no assertion failed> (<no file>:0)
------ STACK TRACE ------
0x455365 <???> at /usr/local/bin/redis-server
0x80091e954 <pthread_sigmask+0x544> at /lib/libthr.so.3
0x80091deb2 <pthread_getspecific+0xe12> at /lib/libthr.so.3
------ INFO OUTPUT ------
# Server
redis_version:4.0.11
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:413940519fc745b5
redis_mode:standalone
os:FreeBSD 11.2-RELEASE-p4 amd64
arch_bits:64
multiplexing_api:kqueue
atomicvar_api:atomic-builtin
gcc_version:4.2.1
process_id:567
run_id:25b7e5c958af18fc4ab1637beda8320afb947bd0
tcp_port:6379
uptime_in_seconds:87427
uptime_in_days:1
hz:10
lru_clock:11796615
executable:/usr/local/bin/redis-server
config_file:/usr/local/etc/redis-rdproxy.conf
# Clients
connected_clients:7
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:5913523570
used_memory_human:5.51G
used_memory_rss:5935114151
used_memory_rss_human:5.53G
used_memory_peak:5965164756
used_memory_peak_human:5.56G
used_memory_peak_perc:99.13%
used_memory_overhead:13689057
used_memory_startup:372157
used_memory_dataset:5899834513
used_memory_dataset_perc:99.77%
total_system_memory:0
total_system_memory_human:0B
used_memory_lua:1624064
used_memory_lua_human:1.55M
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:1.00
mem_allocator:libc
active_defrag_running:0
lazyfree_pending_objects:0
# Persistence
loading:0
rdb_changes_since_last_save:16066
rdb_bgsave_in_progress:0
rdb_last_save_time:1538523208
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:63
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
# Stats
total_connections_received:69164
total_commands_processed:92696476
instantaneous_ops_per_sec:889
total_net_input_bytes:43686303691
total_net_output_bytes:47428934485
instantaneous_input_kbps:2381.30
instantaneous_output_kbps:1253.77
rejected_connections:0
sync_full:0
sync_partial_ok:1
sync_partial_err:0
expired_keys:7034
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:71550134
keyspace_misses:15398543
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:100088
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
# Replication
role:master
connected_slaves:1
slave0:ip=172.25.10.137,port=6379,state=online,offset=232872049413,lag=20
master_replid:1814bbc82cb258b6697fe27761647d869faff596
master_replid2:d31a8e1e79e8b81a1d830904e010f3f1011b4625
master_repl_offset:232873941448
second_repl_offset:222228922513
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:232872892873
repl_backlog_histlen:1048576
# CPU
used_cpu_sys:192.51
used_cpu_user:653.73
used_cpu_sys_children:676.00
used_cpu_user_children:13418.67
# Commandstats
cmdstat_hmget:calls=168351,usec=12747897,usec_per_call=75.72
cmdstat_lpop:calls=1299271,usec=6448942,usec_per_call=4.96
cmdstat_lrem:calls=144,usec=992,usec_per_call=6.89
cmdstat_hgetall:calls=85128729,usec=225637710,usec_per_call=2.65
cmdstat_smembers:calls=463584,usec=93071418,usec_per_call=200.76
cmdstat_info:calls=82606,usec=6679049,usec_per_call=80.85
cmdstat_hincrby:calls=5891,usec=39620,usec_per_call=6.73
cmdstat_llen:calls=39870,usec=48205,usec_per_call=1.21
cmdstat_sismember:calls=165,usec=696,usec_per_call=4.22
cmdstat_select:calls=28400,usec=24213,usec_per_call=0.85
cmdstat_ping:calls=620096,usec=653433,usec_per_call=1.05
cmdstat_exec:calls=15907,usec=58256977,usec_per_call=3662.35
cmdstat_rpop:calls=911,usec=5214,usec_per_call=5.72
cmdstat_lindex:calls=429,usec=1855,usec_per_call=4.32
cmdstat_set:calls=354866,usec=2054828,usec_per_call=5.79
cmdstat_multi:calls=11005,usec=2731,usec_per_call=0.25
cmdstat_replconf:calls=5903,usec=38585,usec_per_call=6.54
cmdstat_discard:calls=38436,usec=34983,usec_per_call=0.91
cmdstat_subscribe:calls=17,usec=148,usec_per_call=8.71
cmdstat_brpop:calls=283,usec=2062,usec_per_call=7.29
cmdstat_zrangebyscore:calls=29232,usec=319988,usec_per_call=10.95
cmdstat_setex:calls=623669,usec=6854749,usec_per_call=10.99
cmdstat_hscan:calls=5382,usec=60642,usec_per_call=11.27
cmdstat_hmset:calls=43581,usec=15822930,usec_per_call=363.07
cmdstat_zrem:calls=20027,usec=98241,usec_per_call=4.91
cmdstat_script:calls=11005,usec=114191,usec_per_call=10.38
cmdstat_sadd:calls=1598,usec=27192,usec_per_call=17.02
cmdstat_rpush:calls=1413737,usec=7929737,usec_per_call=5.61
cmdstat_hset:calls=245492,usec=3398743,usec_per_call=13.84
cmdstat_expire:calls=11030,usec=45021,usec_per_call=4.08
cmdstat_publish:calls=414343,usec=7548016,usec_per_call=18.22
cmdstat_lpush:calls=39654,usec=202856,usec_per_call=5.12
cmdstat_get:calls=848181,usec=2555230,usec_per_call=3.01
cmdstat_client:calls=19,usec=58,usec_per_call=3.05
cmdstat_exists:calls=1500,usec=2790,usec_per_call=1.86
cmdstat_blpop:calls=161853,usec=830965,usec_per_call=5.13
cmdstat_hdel:calls=244718,usec=2631936,usec_per_call=10.75
cmdstat_config:calls=2369,usec=52657,usec_per_call=22.23
cmdstat_srem:calls=1031,usec=11811,usec_per_call=11.46
cmdstat_hget:calls=264893,usec=1023427,usec_per_call=3.86
cmdstat_del:calls=37265,usec=1708277,usec_per_call=45.84
cmdstat_zadd:calls=10971,usec=162919,usec_per_call=14.85
cmdstat_ttl:calls=1,usec=3,usec_per_call=3.00
cmdstat_psync:calls=1,usec=74,usec_per_call=74.00
cmdstat_persist:calls=60,usec=63,usec_per_call=1.05
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=172385,expires=78987,avg_ttl=4319859529
------ CLIENT LIST OUTPUT ------
id=32166 addr=172.25.10.137:51696 fd=19 name= age=6005 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=replconf
id=34850 addr=?:0 fd=245 name=sentinel-8baec0f4-cmd age=5772 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=publish
id=34852 addr=?:0 fd=254 name=sentinel-a159edb4-cmd age=5772 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=publish
id=34853 addr=?:0 fd=255 name=sentinel-8577eba9-cmd age=5772 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=publish
id=34854 addr=?:0 fd=256 name=sentinel-553705ae-cmd age=5772 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=publish
id=34856 addr=?:0 fd=258 name=sentinel-867f4664-cmd age=5772 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=publish
id=69167 addr=172.25.10.137:31380 fd=126 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL
id=69168 addr=172.25.10.137:24014 fd=139 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL
------ REGISTERS ------
567:M 02 Oct 16:34:51.536 # Dumping of registers not supported for this OS/arch
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
Comment From: antirez
Hello, never saw Redis crashing with such stack trace on Linux, nor the stack trace here offers any hint... It's very odd. It never happened in the past using Redis 4.0.10? I can check the differences to get some idea, however we got no reports of issues so far in Redis 4. If possible please attach a gdb session to the Redis instance so that when it crashes we can get more info. Thanks.
Comment From: jgitlin-bt
Thanks @antirez! Yeah it is very weird. We have a total of 6 FreeBSD machines running Redis: 3 in Test (in a cluster, master-slave-backup using Sentinel) and 3 in Prod (in the same configuration) and we've only seen the machines in Test crashing, even though all 6 have the same exact config. We never saw this with any earlier version of Redis, including 4.0.10, but it may be something in our traffic on the Test cluster last week that was triggering it (that's a theory we've been discussing)
The stack trace provides no hints at all, and sent me down a rabbit hole looking into bio.c which was a red herring. After decided on Friday that was not the root cause I attached a gdb session to two of the instances and am awaiting the next crash... however we have stripped binaries so that may not be as helpful.
I can try building a version of redis with debug symbols and see if that provides further clues when used with gd.
Comment From: yoav-steinberg
@jgitlin-bt Since this was a while back and I see there's no new info I'm marking to be closed. Let me know if you do have any updates.