Redis active defrag not working on redis 6.0.10

active defrag doesn't seem to be working for me. so how is the active-defrag-threshold-lower calculated? To me, it looks like I have 63% fragmentation and it should have kicked in already.

docker version:19.03.15 redis version: 6.0.10 cpu: %Cpu(s): 1.9 us, 1.9 sy, 0.0 ni, 93.4 id, 0.0 wa, 0.4 hi, 2.4 si, 0.0 st

# Server
redis_version:6.0.10
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:1c3aed2f9fd88f03
redis_mode:standalone
os:Linux 4.18.0-240.22.1.el8_3.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:8.3.0
process_id:1
run_id:d1100600778231eb699f188ce152654a40efbb60
tcp_port:6379
uptime_in_seconds:2676707
uptime_in_days:30
hz:20
configured_hz:10
lru_clock:6296401
executable:/data/redis-server
config_file:
io_threads_active:0

# Clients
connected_clients:2902
client_recent_max_input_buffer:8
client_recent_max_output_buffer:102568
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:8567435384
used_memory_human:7.98G
used_memory_rss:13998944256
used_memory_rss_human:13.04G
used_memory_peak:10737927120
used_memory_peak_human:10.00G
used_memory_peak_perc:79.79%
used_memory_overhead:101763800
used_memory_startup:803368
used_memory_dataset:8465671584
used_memory_dataset_perc:98.82%
allocator_allocated:8567778488
allocator_active:9028083712
allocator_resident:9236955136
total_system_memory:16320516096
total_system_memory_human:15.20G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:8589934592
maxmemory_human:8.00G
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.05
allocator_frag_bytes:460305224
allocator_rss_ratio:1.02
allocator_rss_bytes:208871424
rss_overhead_ratio:1.52
rss_overhead_bytes:4761989120
mem_fragmentation_ratio:1.63
mem_fragmentation_bytes:5431592632
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:59485176
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:1866918297
rdb_bgsave_in_progress:0
rdb_last_save_time:1631051531
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:5
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:774475776
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:4
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:727506944
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:83145726
total_commands_processed:26493574303
instantaneous_ops_per_sec:9369
total_net_input_bytes:899652434091
total_net_output_bytes:7516153290277
instantaneous_input_kbps:323.75
instantaneous_output_kbps:2494.37
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:12218859
expired_stale_perc:0.08
expired_time_cap_reached_count:8433
expire_cycle_cpu_milliseconds:890998
evicted_keys:3372607
keyspace_hits:8827586696
keyspace_misses:895067103
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:8526
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:48123844
active_defrag_misses:188694909
active_defrag_key_hits:393366
active_defrag_key_misses:188081
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_reads_processed:25757770131
total_writes_processed:25765389090
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Replication
role:master
connected_slaves:0
master_replid:b16e77a71b7506891401d913d141b820657ea7bc
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:171504.742032
used_cpu_user:115794.652233
used_cpu_sys_children:7.736326
used_cpu_user_children:89.917927

# Modules

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=313062,expires=307340,avg_ttl=441379894
db2:keys=59032,expires=25073,avg_ttl=152364505
db3:keys=37563,expires=37563,avg_ttl=22448200
db4:keys=33146,expires=33145,avg_ttl=590817604
db6:keys=6443,expires=6442,avg_ttl=906700
db7:keys=2610,expires=2610,avg_ttl=75354010
db10:keys=16516,expires=16516,avg_ttl=22448200

defrag config:

127.0.0.1:6379> config get *defrag*
 1) "activedefrag"
 2) "yes"
 3) "active-defrag-cycle-min"
 4) "1"
 5) "active-defrag-cycle-max"
 6) "25"
 7) "active-defrag-threshold-lower"
 8) "10"
 9) "active-defrag-threshold-upper"
10) "100"
11) "active-defrag-max-scan-fields"
12) "1000"
13) "active-defrag-ignore-bytes"
14) "104857600"
127.0.0.1:6379>

Comment From: oranagra

The actual "fragmentation" is very low, something else is eating your memory, but that's not fragmentation.

allocator_frag_ratio:1.05
rss_overhead_ratio:1.52

Please post a copy of MEMORY malloc-stats and also /proc/<pid>/smaps and we'll try to figure it out.

Comment From: seqwait

malloc-stat.txt info_all.txt smaps.txt

Sure, here you go:

Comment From: oranagra

Looks like there's a lot of retained memory in the allocator. I think it should have been shown in allocator_resident, but it's not. Anyway, please try MEMORY PURGE to see if it helps.

Comment From: seqwait

retained

it will block Redis service (online), any other good idea?

Comment From: seqwait

Looks like there's a lot of retained memory in the allocator. I think it should have been shown in allocator_resident, but it's not. Anyway, please try MEMORY PURGE to see if it helps.

it doesn't work, run memory purge command immediately return OK but the free command memory still unreleased

Comment From: seqwait

memory doctor output:

127.0.0.1:6379> memory purge OK 127.0.0.1:6379> memory doctor Sam, I detected a few issues in this Redis instance memory implants:

High total RSS: This instance has a memory fragmentation and RSS overhead greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc. Note: The currently used allocator is "jemalloc-5.1.0".
High process RSS overhead: This instance has non-allocator RSS memory overhead is greater than 1.1 (this means that the Resident Set Size of the Redis process is much larger than the RSS the allocator holds). This problem may be due to Lua scripts or Modules.

I'm here to keep you safe, Sam. I want to help you.

Comment From: oranagra

@jasone can you please help explain this: Jemalloc reports mapped and resident of some 9GB, but smaps shows some 15GB and 12GB. Also why is retained sho large? doesn't it mane mapped but not resident? (i see only about 3GB).

Here's the relevant portion of the above attached files (i added commas for readability)

Allocated: 8,527,135,304, active: 8,984,121,344, metadata: 196,143,016 (n_thp 0), resident: 9,186,025,472, mapped: 9,220,657,152, retained: 7,641,493,504

smaps:

7f9b9fe00000-7f9f60a00000 rw-p 00000000 00:00 0 
Size:           15,740,928 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:            12,901,276 kB

Comment From: jasone

mapped + retained is the closest analogue to the smaps Size, so the virtual memory stats are in reasonable agreement. I would expect resident and Rss to be much closer than they are. Perhaps retained pages were touched after being madvised away. I don't see evidence in the logs that muzzy memory could by themselves account for the difference, but internal fragmentation of transparent huge pages could potentially be part of the explanation as well.

@interwq / @davidtgoldblatt, do you see anything I'm missing? Some of the stats jemalloc reports are new since I last dove into the code...

Comment From: oranagra

@jasone thank you. As far as i know transparent huge pages shouldn't be used.

@seqwait please look at the logs and make sure you don't see this warning:

WARNING you have Transparent Huge Pages (THP) support enabled in your kernel.

and also please please post the result of cat /sys/kernel/mm/transparent_hugepage/enabled

Comment From: seqwait

@jasone thank you. As far as i know transparent huge pages shouldn't be used.

@seqwait please look at the logs and make sure you don't see this warning:

WARNING you have Transparent Huge Pages (THP) support enabled in your kernel.

and also please please post the result of cat /sys/kernel/mm/transparent_hugepage/enabled

before
[always] madvise never

after
always madvise [never]

ok, thanks very much! I will try it.

Comment From: davidtgoldblatt

Yeah, thp [always] is not something that we don't have a great story on in 5.1 (although I think in dev, fixable via configuration). (Sorry, don't have a ton of time to dig deeper -- on paternity leave right now and knee deep in dirty diapers; maybe @interwq or @lapenkov could double-check).

Comment From: interwq

Agreed that the [always] option on THP is most likely the root. Nothing else in stats jumps out to me. The retained bytes is indeed a bit high but not out-of-line. Updating to 5.2 should improve it -- there were a few changes to reduce VM fragmentation / metadata usage such as the max_active_fit option.

Comment From: oranagra

Thank you jemalloc team. we plan to have redis 7.0 use jemalloc 5.2.1.

@seqwait can you confirm that changing to never solved the problem?

Comment From: seqwait

I'm not sure. I didn't restart Redis, the current data @oranagra info_all.txt malloc-stat.txt smaps.txt doctor.txt

Comment From: oranagra

i think you have to restart in order to verify.

Comment From: seqwait

i think you have to restart in order to verify.

Yes, I have restarted Redis, and it looks normal now. I will keep observing and feedback. Thank you very much