Redis defragger improvements around large bins

https://github.com/jemalloc/jemalloc/issues/1098#issuecomment-1588962477 few things that can be improved: 1. don't attempt to defrag (wasting CPU cycles), if the fragmentation in small bins doesn't cross the threshold (ignore the active pages used by large bins) 2. consider adding INFO metrics for retained and muzzy memory (stats.arenas.<i>.pmuzzy) 3. and used memory breakdown of large vs small bins.

Comment From: zuiderkwast

This looks like your notes to yourself. :) Do you need the rest of us to get involved?

I see #12315. Didn't that already get rid of fragmentation on large bins?

Comment From: oranagra

right, this was a "draft" note in the 8.0 project and i promoted it to an issue so that it can be assigned or discussed.

what's fixed in #12315 does reduce some case of "fragmentation" in large bins, but i was worried that it could be others, and wanted to somehow measure only fragmentation inside small bins to trigger defrag. i don't remember how i wanted to gather that metric, i see that it doesn't exist in mallctl, but we do have it (per bin) in malloc_stats (so in theory we can add an API to expose it to redis)

Comment From: sundb

@zuiderkwast did you start? if not, i'd like to start.

Comment From: zuiderkwast

@sundb please go ahead! I didn't start.

Comment From: sundb

@oranagra eliminating large bins actually results in a higher frag rate. when there are far more large bins than small bins, the frag rate will approach zero. I inserted 1000 strings of 100k into Redis, and the fragmentation rate would be 0% before eliminating, but when I eliminating the large bins, the fragmentation rate was 130%.

Comment From: oranagra

right. but that's indeed the fragmentation ratio in the area we can defrag. on the other hand, if that memory consumption is negligible compared to what we consume, maybe we shouldn't bother to defrag it.

the reason i suggested this change was because i wanted to hide any memory overheads in large bins (or other non-defraggable overheads), i.e. anything that's not defraggable. so maybe we can take a different formula and get that, without suffering from the amplification of the fact most memory is in large bins.

e.g. we can sum all the memory wasted in small bins (i.e. explicit calculation of small bin fragmentation), and then divide that by the total memory usage, rather than the small-bin memory usage.

WDYT?

Comment From: sundb

@oranagra why is the formula for fragmentation percent ((float)resident / allocated)*100 - 100, instead of (active - allocated) / active?

Comment From: oranagra

you mean why isn't it (active-allocated)*100/allocated (which gives the same result as the one in the code) the formula you suggested (the one who uses resident isn't actually used (just a print). and also, yours results in a scale of 0..1, not 0..100

the other difference is that the one in the code measures the fragmentation overhead as a portion of the allocated memory (dividing by allocated), and the one you gave would give the fragmentation overhead as a portion of the total active memory. e.g. if we have 200gb active, and 150gb allocated, is the fragmentation 33% or 25%. (current code considers it to be 33%, i.e. 50 out of 150).

am i missing anything?