I think this warning is causing some unnecessary grief:

30723:M 22 Mar 15:17:24.191 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

RHEL-alike distros/installs use the madvise option by default, which requires applications to opt-in to using hugepages explicitly:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

Advertising the disabling of even madvise support for hugepages is probably not for the best, as hugepages can be quite useful on some workloads with high concurrency. It may also be reasonable to re-word the warning to suggest:

echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

Comment From: badboy

Changing the check should be as easy as (though I don't know the internal handling of huge pages in the allocator)

diff --git i/src/latency.c w/src/latency.c
index 9e9f1f13..af6e75b4 100644
--- i/src/latency.c
+++ w/src/latency.c
@@ -71,7 +71,7 @@ int THPIsEnabled(void) {
         return 0;
     }
     fclose(fp);
-    return (strstr(buf,"[never]") == NULL) ? 1 : 0;
+    return strstr(buf,"[always]") ? 1 : 0;
 }
 #endif

Comment From: antirez

@fdr, @badboy thanks, in order to merge this I wonder if we are sure Jemalloc does not use madvice in order to force huge pages usage in some of the versions we use or in some future versions. I have yet not checked, do you have any info on that? Thanks.

Comment From: fdr

It can do that, and I think that is the default, with which people have recorded some problems a couple of years ago: https://github.com/jemalloc/jemalloc/issues/524. One option is to bundle jemalloc without use of hugepages, as it is optional. The other is to use let it use hugepages. The question is, are the marginal risks worth the reduction in TLB misses?

Comment From: 007

Can this be updated given the information in https://github.com/jemalloc/jemalloc/pull/1134?

Comment From: wknapik

@antirez this issue has been open for 20 months. If this was a child, it would be speaking by now.

This is about a system-wide setting (in kubernetes, it's effectively cluster-wide), that affects all applications on that system (or cluster), so it's not something users should change unless they really have to. The default madvise is ideal, because it leaves the choice to each application. If this is safe, there should be no warning (#4001). If it's not, then it's a bug, that needs to be addressed.

Right now the message is telling people to make a potentially harmful change to their systems/clusters on the off chance that it will make redis faster.

Comment From: gjcarneiro

How about using MADV_NOHUGEPAGE? The dependency jemalloc seems to be aware of it, but probably redis needs to enable some jemalloc option to use it?

Comment From: 007

Happy 2nd birthday #3895 - sorry I'm a couple weeks late! 🎉

Comment From: mohag

jemalloc seems to have a setting to disable the use of hugepages: https://github.com/antirez/redis/blob/fc0c9c8097a5b2bc8728bec9cfee26817a702f09/deps/jemalloc/ChangeLog#L21

I do not want to disable a setting on an entire k8s cluster for one app (or set up seperate hosts for one app) (Especially if the OS provides a way that the apps that benefit from it can explicitly enable it)

Comment From: gjcarneiro

That is a good find, @mohang: http://jemalloc.net/jemalloc.3.html#tuning

The string specified via --with-malloc-conf, the string pointed to by the global variable malloc_conf, the “name” of the file referenced by the symbolic link named /etc/malloc.conf, and the value of the environment variable MALLOC_CONF, will be interpreted, in that order, from left to right as options. Note that malloc_conf may be read before main() is entered, so the declaration of malloc_conf should specify an initializer that contains the final value to be read by jemalloc. --with-malloc-conf and malloc_conf are compile-time mechanisms, whereas /etc/malloc.conf and MALLOC_CONF can be safely set any time prior to program invocation.

So, if I read correctly, we should be able to get the behaviour we want just by defining the following environment variable: export MALLOC_CONF=thp:never. Without even recompiling redis!

(someone should try it, I don't have time this week)

Comment From: srolija

They seem to have fixed some issue with that environment variable https://github.com/jemalloc/jemalloc/pull/1704. That should allow export MALLOC_CONF=thp:never to override it on process level even when THP is globally configured to always. So with that, on the perspective of memory leaks, the warning could be updated to prompt the user to add the aforementioned variable when configured to always, and not to be displayed otherwise.

As far as I understand from this thread and comments on https://github.com/antirez/redis/pull/4001 the only point remaining from there is that it might have some latency issues just by being configured to madvise despite THP not being used by the Redis process.

But from my understanding of reading through kernel documentation on THP the latency incurred is on the memory allocation when the new huge page is being added and synchronous reclaim is done:

always - means that an application requesting THP will stall on allocation failure and directly reclaim pages and compact memory in an effort to allocate a THP immediately. This may be desirable for virtual machines that benefit heavily from THP use and are willing to delay the VM start to utilise them.

On the other hand;

madvise - will enter direct reclaim like always but only for regions that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

So if using madvise and none segments are marked as huge pages, or when using always and using the MADV_NOHUGEPAGE by setting the environment variable above, there shouldn't be latency problems from direct reclaims as those would never be allowed to happen. And since these segments wouldn't have THP enabled, they wouldn't end up compacted when other processes try to allocate huge pages.

With all this in mind, is there some reason for not removing warning when using madvise, I can see why it would be enabled for always, but as pointed out, even then it could be configured to work properly so it might make sense to update message in that case.

Comment From: akvadrako

Any update on this?

Comment From: oranagra

@jasone can you please help us settle this issue safely. Question is if jemalloc (Either 5.1 and up, or any older version which may be provided by the distro) will ever use madvise to to enable THP without being explicitly configured to do that? Or alternatively, maybe we can test that at runtime (looking at opt.thp)? I rather have your confirmation than make a mistake about it. Thanks.

Comment From: jasone

@oranagra, no precise promises re: what jemalloc may do at some point in the future, but it's safe to assume that jemalloc won't do anything that yanks the rug out of under applications. Automatically enabling THP with synchronous huge page allocation would be a pretty awful default. (Aside: I worry that huge pages may eventually be the death of non-moving memory allocation.) With regard to redis recommending

echo never >/sys/kernel/mm/transparent_hugepage/enabled

that's definitely a poor recommendation. The kernel default (madvise) is the right choice, even for a system dedicated to running redis. With regard to jemalloc configuration, I think the default (thp=default) is correct for redis, unless it is proactively defragmenting malloc()ed memory. I haven't kept track of redis's behavior, but do recall a couple different proposals for periodically iterating through memory and evacuating sparsely utilized jemalloc slabs. If redis is configured to do this, then configuring jemalloc with thp=always may be beneficial (but better might be to control THP utilization more precisely via custom arenas). Only then does it become relevant how the kernel is configured to defragment physical memory. The following setting might be useful for reducing latency, but I have no direct experience using it; caveat emptor.

echo defer >/sys/kernel/mm/transparent_hugepage/defrag

[Linux docs re: THP]

Comment From: oranagra

@jasone thank you. redis does indeed have a mechanism to actively defrag it's allocations, it uses a patch in jemalloc to determine which allocation to move: https://github.com/jemalloc/jemalloc/issues/566. This mechanism is not enabled by default though.

Anyway, just to be certain, you're confirming that when the kernel THP is set to madvise, the current and past versions of jemalloc will not utilize THP unless redis explicitly changes the opt.thp from default to always? is there a risk that the distro will enable that for us? in which case it's better to test opt.thp at runtime?

Comment From: jasone

Correct, jemalloc has not defaulted to MADV_HUGEPAGE, and a distro would be terribly mistaken to package jemalloc with a different default given the current hardware/software landscape. It's unlikely that redis will encounter inadvertent misconfiguration in the wild.