@oranagra hi, I'm running redis 4.0.12 in CentOS Linux release 7.4.1708 (Core). I have enabled active-defrag but it doesn't seem to do anything. can you help me to find out why active-defrag doesn't run?

the follower is some example about the

info memory

used_memory:9823369872 used_memory_human:9.15G used_memory_rss:27904212992 used_memory_rss_human:25.99G used_memory_peak:27399710848 used_memory_peak_human:25.52G used_memory_peak_perc:35.85% used_memory_overhead:4061865096 used_memory_startup:3066944 used_memory_dataset:5761504776 used_memory_dataset_perc:58.67% total_system_memory:404141654016 total_system_memory_human:376.39G used_memory_lua:37888 used_memory_lua_human:37.00K maxmemory:32212254720 maxmemory_human:30.00G maxmemory_policy:volatile-lru mem_fragmentation_ratio:2.84 mem_allocator:jemalloc-4.0.3 active_defrag_running:75

config

1) "active-defrag-threshold-lower" 2) "1" 3) "active-defrag-threshold-upper" 4) "100" 5) "active-defrag-ignore-bytes" 6) "1048576" 7) "active-defrag-cycle-min" 8) "25" 9) "active-defrag-cycle-max" 10) "75" 11) "activedefrag" 12) "yes"

debug log

59047:S 12 Nov 16:51:37.134 . allocated=9824517792, active=27259707392, resident=27925131264, frag=177% (184% rss), frag_bytes=17435189600 (18100613472% rss)
59047:S 12 Nov 16:51:38.876 . allocated=9824527664, active=27259756544, resident=27925131264, frag=177% (184% rss), frag_bytes=17435228880 (18100603600% rss)
59047:S 12 Nov 16:51:40.619 . allocated=9824527664, active=27259756544, resident=27925131264, frag=177% (184% rss), frag_bytes=17435228880 (18100603600% rss)

cat proc smaps info

01825000-01846000 rw-p 00000000 00:00 0 [heap] Size: 132 kB Rss: 56 kB Pss: 56 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 56 kB Referenced: 56 kB Anonymous: 56 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac sd 7f6b4ae00000-7f71e3c00000 rw-p 00000000 00:00 0 Size: 27670528 kB Rss: 27243196 kB Pss: 27243196 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 27243196 kB Referenced: 27243196 kB Anonymous: 27243196 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac sd

memory malloc-stats

127.0.0.1:11101> MEMORY MALLOC-STATS ___ Begin jemalloc statistics ___ Version: 4.0.3-0-ge9192eacf8935e29fc62fddc2701f7942b1cc02c Assertions disabled Run-time option settings: opt.abort: false opt.lg_chunk: 21 opt.dss: "secondary" opt.narenas: 128 opt.lg_dirty_mult: 3 (arenas.lg_dirty_mult: 3) opt.stats_print: false opt.junk: "false" opt.quarantine: 0 opt.redzone: false opt.zero: false opt.tcache: true opt.lg_tcache_max: 15 CPUs: 32 Arenas: 128 Pointer size: 8 Quantum size: 8 Page size: 4096 Min active:dirty page ratio per arena: 8:1 Maximum thread-cached size class: 32768 Chunk size: 2097152 (2^21) Allocated: 9824657832, active: 27259772928, metadata: 666136320, resident: 27925217280, mapped: 28343009280 Current active ceiling: 27260878848

Comment From: oranagra

Hi, This redis is a bit old, and the memory metrics in INFO are a bit lacking. but the prints you got from the log file indicate that indeed you have about 177% fragmentation overhead. the defragger is indeed running, attempting to consume the maximum CPU time it can (75%), so i wonder why it's not able to do anything. can you please post your full info (specifically INFO stats contains some hits and misses)? please specify if you're using any modules, and if not what data types compose the majority of the keyspace / memory. the fragments you posted from malloc-stats and smaps are insufficient, not sure there's an useful info there, but please upload full versions.

Comment From: liuhuang9492

hi,thanks for reply.

info

Server

redis_version:4.0.12 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:f16c83d04220d86d redis_mode:cluster os:Linux 3.10.0-693.el7.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:4.8.5 process_id:227841 run_id:323d03ca48befda5744f1939d2ae1addff1943b9 tcp_port:11109 uptime_in_seconds:6036069 uptime_in_days:69 hz:30 lru_clock:9556280 executable:/usr/local/bin/redis-server config_file:/data/cachecloud/conf/redis-cluster-11109.conf

Clients

connected_clients:3 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0

Memory

used_memory:9289824264 used_memory_human:8.65G used_memory_rss:27104083968 used_memory_rss_human:25.24G used_memory_peak:26334656816 used_memory_peak_human:24.53G used_memory_peak_perc:35.28% used_memory_overhead:3940330866 used_memory_startup:3066944 used_memory_dataset:5349493398 used_memory_dataset_perc:57.60% total_system_memory:404141654016 total_system_memory_human:376.39G used_memory_lua:37888 used_memory_lua_human:37.00K maxmemory:32212254720 maxmemory_human:30.00G maxmemory_policy:volatile-lru mem_fragmentation_ratio:2.92 mem_allocator:jemalloc-4.0.3 active_defrag_running:0 lazyfree_pending_objects:0

Persistence

loading:0 rdb_changes_since_last_save:848678652 rdb_bgsave_in_progress:0 rdb_last_save_time:1630910163 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0

Stats

total_connections_received:731815 total_commands_processed:788436833 instantaneous_ops_per_sec:68 total_net_input_bytes:99087275715 total_net_output_bytes:5382267761 instantaneous_input_kbps:7.34 instantaneous_output_kbps:0.05 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0

Replication

role:slave master_host:10.192.70.19 master_port:11109 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_repl_offset:98854529501 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:954e2fbeac32e1f394d8238e5ab64e6881dc9cd9 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:98854529501 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:150000000 repl_backlog_first_byte_offset:98704529502 repl_backlog_histlen:150000000

CPU

used_cpu_sys:4648.14 used_cpu_user:9626.80 used_cpu_sys_children:0.00 used_cpu_user_children:0.00

Cluster

cluster_enabled:1

Keyspace

db0:keys=25387087,expires=25268386,avg_ttl=0


Comment From: liuhuang9492

For security reasons, we cannot upload files, so we can only upload them in sections.I'm sorry

memory malloc-stats

___ Begin jemalloc statistics ___ Version: 4.0.3-0-ge9192eacf8935e29fc62fddc2701f7942b1cc02c Assertions disabled Run-time option settings: opt.abort: false opt.lg_chunk: 21 opt.dss: "secondary" opt.narenas: 128 opt.lg_dirty_mult: 3 (arenas.lg_dirty_mult: 3) opt.stats_print: false opt.junk: "false" opt.quarantine: 0 opt.redzone: false opt.zero: false opt.tcache: true opt.lg_tcache_max: 15 CPUs: 32 Arenas: 128 Pointer size: 8 Quantum size: 8 Page size: 4096 Min active:dirty page ratio per arena: 8:1 Maximum thread-cached size class: 32768 Chunk size: 2097152 (2^21) Allocated: 9291064432, active: 26374455296, metadata: 638340864, resident: 27239452672, mapped: 27248295936 Current active ceiling: 26375880704

arenas[0]: assigned threads: 1 dss allocation precedence: secondary min active:dirty page ratio: 8:1 dirty pages: 6439076:55627 active:dirty, 0 sweeps, 0 madvises, 0 purged allocated nmalloc ndalloc nrequests small: 6971634800 1198014435 1034532974 7654824481 large: 4173824 125484451 125484438 125859333 huge: 2315255808 21 18 21 total: 9291064432 1323498907 1160017430 7780683835 active: 26374455296 mapped: 27242004480 metadata: mapped: 632905728, allocated: 1203456

Comment From: oranagra

@liuhuang9492 in the previous posts, you showed active_defrag_running:75 but in this one it's 0. and since active_defrag_hits and active_defrag_misses are also 0, i suppose this is a restarted server, or one on which active_defrag_running was never used. since i do see high fragmentation, i suppose it was simply not enabled (yet?). another possible explanation is that someone called CONFIG RESETSTAT.

regarding memory malloc-stats, this info is still not sufficient, i wanna look at the bit table it prints below that summary.

Comment From: liuhuang9492

Sorry, the first upload has restarted the service.I found the configuration information of the other master node.The followin info.txt memory-malloc-stats.txt

Comment From: oranagra

so i see that active defrag does do some work:

active_defrag_hits:9499179
active_defrag_misses:2476504029
active_defrag_key_hits:7723606
active_defrag_key_misses:606727513

and that the majority of the fragmented memory is in the 24 bytes and 8 bytes bins:

bins:           size ind    allocated      nmalloc      ndalloc    nrequests      curregs      curruns regs pgs  util       nfills     nflushes      newruns       reruns
                   8   0    203054736    194185991    168804149  18212176289     25381842       174736  512   1 0.283     16412476     21625000       200695     18029823
                  16   1     30022752    182879832    181003410  30015496983      1876422         9796  256   1 0.748     40636604     16117678        20241     43744741
                  24   2   1343671608    455671497    399685180  34180562901     55986317       357417  512   3 0.30

maybe this is the cluster slots to keys mapping which we don't currently defrag. @madolson can you think of a way to prove this theory?

Comment From: oranagra

Ohh, just realized that in redis 7.0 (due to #9356) we're now actually defragging the slot-to-keys data stracture (since it's part of the dictEntry).

Comment From: zuiderkwast

Isn't this easy to detect by running defrag in a cluster test case? Just to prove it with versions before and after the slot-to-key rewrite..

Comment From: oranagra

i guess you can do that (if you have spare time)

Comment From: liuhuang9492

whats is the cluster slots to keys mapping? it means the more keys, the more memory used?

Comment From: zuiderkwast

whats is the cluster slots to keys mapping?

It's a structure to lookup all the keys in a cluster slot.

it means the more keys, the more memory used?

Yes, in current unstable, two extra pointers for every key is used when cluster mode is enabled. Before #9356 (i.e. in all released versions of Redis) it is even more and also more memory is used for keys with long names.

Comment From: oranagra

IIRC, redis 4.0 used rax, and i'm not sure how likely is it that rax node will use 8 - 24 byte allocations, and if it makes sense that these are the cause of fragmentation. It seems reasonable i guess... it has 4 byte header, and then one or more consecutive unique chars, possibly with pointers.... can't think of a way to prove it without being able to reproduce it and then test the code in the unstable branch.