@oranagra hi, I'm running redis 4.0.12 in CentOS Linux release 7.4.1708 (Core). I have enabled active-defrag but it doesn't seem to do anything. can you help me to find out why active-defrag doesn't run?
the follower is some example about the
info memory
used_memory:9823369872 used_memory_human:9.15G used_memory_rss:27904212992 used_memory_rss_human:25.99G used_memory_peak:27399710848 used_memory_peak_human:25.52G used_memory_peak_perc:35.85% used_memory_overhead:4061865096 used_memory_startup:3066944 used_memory_dataset:5761504776 used_memory_dataset_perc:58.67% total_system_memory:404141654016 total_system_memory_human:376.39G used_memory_lua:37888 used_memory_lua_human:37.00K maxmemory:32212254720 maxmemory_human:30.00G maxmemory_policy:volatile-lru mem_fragmentation_ratio:2.84 mem_allocator:jemalloc-4.0.3 active_defrag_running:75
config
1) "active-defrag-threshold-lower" 2) "1" 3) "active-defrag-threshold-upper" 4) "100" 5) "active-defrag-ignore-bytes" 6) "1048576" 7) "active-defrag-cycle-min" 8) "25" 9) "active-defrag-cycle-max" 10) "75" 11) "activedefrag" 12) "yes"
debug log
59047:S 12 Nov 16:51:37.134 . allocated=9824517792, active=27259707392, resident=27925131264, frag=177% (184% rss), frag_bytes=17435189600 (18100613472% rss)
59047:S 12 Nov 16:51:38.876 . allocated=9824527664, active=27259756544, resident=27925131264, frag=177% (184% rss), frag_bytes=17435228880 (18100603600% rss)
59047:S 12 Nov 16:51:40.619 . allocated=9824527664, active=27259756544, resident=27925131264, frag=177% (184% rss), frag_bytes=17435228880 (18100603600% rss)
cat proc smaps info
01825000-01846000 rw-p 00000000 00:00 0 [heap] Size: 132 kB Rss: 56 kB Pss: 56 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 56 kB Referenced: 56 kB Anonymous: 56 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac sd 7f6b4ae00000-7f71e3c00000 rw-p 00000000 00:00 0 Size: 27670528 kB Rss: 27243196 kB Pss: 27243196 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 27243196 kB Referenced: 27243196 kB Anonymous: 27243196 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac sd
memory malloc-stats
127.0.0.1:11101> MEMORY MALLOC-STATS ___ Begin jemalloc statistics ___ Version: 4.0.3-0-ge9192eacf8935e29fc62fddc2701f7942b1cc02c Assertions disabled Run-time option settings: opt.abort: false opt.lg_chunk: 21 opt.dss: "secondary" opt.narenas: 128 opt.lg_dirty_mult: 3 (arenas.lg_dirty_mult: 3) opt.stats_print: false opt.junk: "false" opt.quarantine: 0 opt.redzone: false opt.zero: false opt.tcache: true opt.lg_tcache_max: 15 CPUs: 32 Arenas: 128 Pointer size: 8 Quantum size: 8 Page size: 4096 Min active:dirty page ratio per arena: 8:1 Maximum thread-cached size class: 32768 Chunk size: 2097152 (2^21) Allocated: 9824657832, active: 27259772928, metadata: 666136320, resident: 27925217280, mapped: 28343009280 Current active ceiling: 27260878848
Comment From: oranagra
Hi, This redis is a bit old, and the memory metrics in INFO are a bit lacking. but the prints you got from the log file indicate that indeed you have about 177% fragmentation overhead. the defragger is indeed running, attempting to consume the maximum CPU time it can (75%), so i wonder why it's not able to do anything. can you please post your full info (specifically INFO stats contains some hits and misses)? please specify if you're using any modules, and if not what data types compose the majority of the keyspace / memory. the fragments you posted from malloc-stats and smaps are insufficient, not sure there's an useful info there, but please upload full versions.
Comment From: liuhuang9492
hi,thanks for reply.
info
Server
redis_version:4.0.12 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:f16c83d04220d86d redis_mode:cluster os:Linux 3.10.0-693.el7.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:4.8.5 process_id:227841 run_id:323d03ca48befda5744f1939d2ae1addff1943b9 tcp_port:11109 uptime_in_seconds:6036069 uptime_in_days:69 hz:30 lru_clock:9556280 executable:/usr/local/bin/redis-server config_file:/data/cachecloud/conf/redis-cluster-11109.conf
Clients
connected_clients:3 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
Memory
used_memory:9289824264 used_memory_human:8.65G used_memory_rss:27104083968 used_memory_rss_human:25.24G used_memory_peak:26334656816 used_memory_peak_human:24.53G used_memory_peak_perc:35.28% used_memory_overhead:3940330866 used_memory_startup:3066944 used_memory_dataset:5349493398 used_memory_dataset_perc:57.60% total_system_memory:404141654016 total_system_memory_human:376.39G used_memory_lua:37888 used_memory_lua_human:37.00K maxmemory:32212254720 maxmemory_human:30.00G maxmemory_policy:volatile-lru mem_fragmentation_ratio:2.92 mem_allocator:jemalloc-4.0.3 active_defrag_running:0 lazyfree_pending_objects:0
Persistence
loading:0 rdb_changes_since_last_save:848678652 rdb_bgsave_in_progress:0 rdb_last_save_time:1630910163 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0
Stats
total_connections_received:731815 total_commands_processed:788436833 instantaneous_ops_per_sec:68 total_net_input_bytes:99087275715 total_net_output_bytes:5382267761 instantaneous_input_kbps:7.34 instantaneous_output_kbps:0.05 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0
Replication
role:slave master_host:10.192.70.19 master_port:11109 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_repl_offset:98854529501 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:954e2fbeac32e1f394d8238e5ab64e6881dc9cd9 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:98854529501 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:150000000 repl_backlog_first_byte_offset:98704529502 repl_backlog_histlen:150000000
CPU
used_cpu_sys:4648.14 used_cpu_user:9626.80 used_cpu_sys_children:0.00 used_cpu_user_children:0.00
Cluster
cluster_enabled:1
Keyspace
db0:keys=25387087,expires=25268386,avg_ttl=0
Comment From: liuhuang9492
For security reasons, we cannot upload files, so we can only upload them in sections.I'm sorry
memory malloc-stats
___ Begin jemalloc statistics ___ Version: 4.0.3-0-ge9192eacf8935e29fc62fddc2701f7942b1cc02c Assertions disabled Run-time option settings: opt.abort: false opt.lg_chunk: 21 opt.dss: "secondary" opt.narenas: 128 opt.lg_dirty_mult: 3 (arenas.lg_dirty_mult: 3) opt.stats_print: false opt.junk: "false" opt.quarantine: 0 opt.redzone: false opt.zero: false opt.tcache: true opt.lg_tcache_max: 15 CPUs: 32 Arenas: 128 Pointer size: 8 Quantum size: 8 Page size: 4096 Min active:dirty page ratio per arena: 8:1 Maximum thread-cached size class: 32768 Chunk size: 2097152 (2^21) Allocated: 9291064432, active: 26374455296, metadata: 638340864, resident: 27239452672, mapped: 27248295936 Current active ceiling: 26375880704
arenas[0]: assigned threads: 1 dss allocation precedence: secondary min active:dirty page ratio: 8:1 dirty pages: 6439076:55627 active:dirty, 0 sweeps, 0 madvises, 0 purged allocated nmalloc ndalloc nrequests small: 6971634800 1198014435 1034532974 7654824481 large: 4173824 125484451 125484438 125859333 huge: 2315255808 21 18 21 total: 9291064432 1323498907 1160017430 7780683835 active: 26374455296 mapped: 27242004480 metadata: mapped: 632905728, allocated: 1203456
Comment From: oranagra
@liuhuang9492 in the previous posts, you showed active_defrag_running:75 but in this one it's 0.
and since active_defrag_hits and active_defrag_misses are also 0, i suppose this is a restarted server, or one on which active_defrag_running was never used.
since i do see high fragmentation, i suppose it was simply not enabled (yet?).
another possible explanation is that someone called CONFIG RESETSTAT.
regarding memory malloc-stats, this info is still not sufficient, i wanna look at the bit table it prints below that summary.
Comment From: liuhuang9492
Sorry, the first upload has restarted the service.I found the configuration information of the other master node.The followin info.txt memory-malloc-stats.txt
Comment From: oranagra
so i see that active defrag does do some work:
active_defrag_hits:9499179
active_defrag_misses:2476504029
active_defrag_key_hits:7723606
active_defrag_key_misses:606727513
and that the majority of the fragmented memory is in the 24 bytes and 8 bytes bins:
bins: size ind allocated nmalloc ndalloc nrequests curregs curruns regs pgs util nfills nflushes newruns reruns
8 0 203054736 194185991 168804149 18212176289 25381842 174736 512 1 0.283 16412476 21625000 200695 18029823
16 1 30022752 182879832 181003410 30015496983 1876422 9796 256 1 0.748 40636604 16117678 20241 43744741
24 2 1343671608 455671497 399685180 34180562901 55986317 357417 512 3 0.30
maybe this is the cluster slots to keys mapping which we don't currently defrag. @madolson can you think of a way to prove this theory?
Comment From: oranagra
Ohh, just realized that in redis 7.0 (due to #9356) we're now actually defragging the slot-to-keys data stracture (since it's part of the dictEntry).
Comment From: zuiderkwast
Isn't this easy to detect by running defrag in a cluster test case? Just to prove it with versions before and after the slot-to-key rewrite..
Comment From: oranagra
i guess you can do that (if you have spare time)
Comment From: liuhuang9492
whats is the cluster slots to keys mapping? it means the more keys, the more memory used?
Comment From: zuiderkwast
whats is the cluster slots to keys mapping?
It's a structure to lookup all the keys in a cluster slot.
it means the more keys, the more memory used?
Yes, in current unstable, two extra pointers for every key is used when cluster mode is enabled. Before #9356 (i.e. in all released versions of Redis) it is even more and also more memory is used for keys with long names.
Comment From: oranagra
IIRC, redis 4.0 used rax, and i'm not sure how likely is it that rax node will use 8 - 24 byte allocations, and if it makes sense that these are the cause of fragmentation. It seems reasonable i guess... it has 4 byte header, and then one or more consecutive unique chars, possibly with pointers.... can't think of a way to prove it without being able to reproduce it and then test the code in the unstable branch.