This is a weird one.
After upgrading to 3.2.7 redis just hangs using up all the CPU on one core, nothing is output to the log file, the rdb file isn't updated and all connections fail.
Since this only started happening with 3.2.7 and the biggest change there was Jemalloc I reverted 27e29f4fe61d822eb23d948bcb72db76c4c887e5 and tried it again and it worked fine. So it appears the Jemalloc upgrade is causing the hang.
Unfortunately I've not found an easy way of replicating this except by using production data.
I'm not sure what to do to debug further so I'm raising this issue.
Comment From: antirez
Thanks @rwky, where you coming from 3.2.6?
Comment From: antirez
p.s. also please report your glibc version if possible. Thank you.
Comment From: rwky
Yep straight upgrade from 3.2.6 which was working solidly.
glibc is ldd (Debian GLIBC 2.19-18+deb8u7) 2.19
Comment From: davidtgoldblatt
Two ideas: - Can you grab some stack traces to see where the spinning is happening? - Can you try building jemalloc with --enable-debug?
Comment From: rwky
Will do, it'll be a little while before it acts up again it maybe tomorrow before I can grab the details.
Comment From: rwky
We got lucky it happened before I went to bed. Attached is the redis log after sending it SIGSEGV hopefully it's useful. redis.txt
Comment From: antirez
@rwky thanks a lot. Based on your new observations, are you still confident that the problem only happens with Jemalloc 4.4.0? It may be safe to release Redis 3.2.8 with the commit reverted at this point... Thanks.
Comment From: rwky
Yep since I reverted that commit it works fine so it's something Jemalloc related and it's 100% repeatable I'm just not sure exactly what is triggering it something in our production work load.
Comment From: antirez
Thank you, I think it's better to release 3.2.8 ASAP.
Comment From: antirez
@davidtgoldblatt in case you did not notice, we have a stack trace thanks to @rwky:
------ STACK TRACE ------
EIP:
/usr/local/bin/redis-server 127.0.0.1:6379(je_spin_adaptive+0x22)[0x4e3552]
Backtrace:
/usr/local/bin/redis-server 127.0.0.1:6379(logStackTrace+0x29)[0x4623a9]
/usr/local/bin/redis-server 127.0.0.1:6379(sigsegvHandler+0xa6)[0x462a46]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f5ffbb60890]
/usr/local/bin/redis-server 127.0.0.1:6379(je_spin_adaptive+0x22)[0x4e3552]
/usr/local/bin/redis-server 127.0.0.1:6379(je_chunk_alloc_dss+0x1d8)[0x4c16f8]
/usr/local/bin/redis-server 127.0.0.1:6379(je_chunk_alloc_wrapper+0x948)[0x4c0a18]
/usr/local/bin/redis-server 127.0.0.1:6379(je_arena_chunk_ralloc_huge_expand+0x263)[0x4b7bd3]
/usr/local/bin/redis-server 127.0.0.1:6379[0x4d6ff0]
/usr/local/bin/redis-server 127.0.0.1:6379(je_huge_ralloc_no_move+0x314)[0x4d7804]
/usr/local/bin/redis-server 127.0.0.1:6379(je_huge_ralloc+0x5c)[0x4d7b8c]
/usr/local/bin/redis-server 127.0.0.1:6379(je_realloc+0xb2)[0x4ae072]
/usr/local/bin/redis-server 127.0.0.1:6379(zrealloc+0x26)[0x431f76]
/usr/local/bin/redis-server 127.0.0.1:6379(sdsMakeRoomFor+0x2bd)[0x42f91d]
/usr/local/bin/redis-server 127.0.0.1:6379(readQueryFromClient+0xae)[0x43ab0e]
/usr/local/bin/redis-server 127.0.0.1:6379(aeProcessEvents+0x133)[0x425463]
/usr/local/bin/redis-server 127.0.0.1:6379(aeMain+0x2b)[0x4257ab]
/usr/local/bin/redis-server 127.0.0.1:6379(main+0x40b)[0x42285b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5ffb7c7b45]
/usr/local/bin/redis-server 127.0.0.1:6379[0x4229fe]
Comment From: davidtgoldblatt
Thanks, I had missed that. I'll take a look sometime tomorrow.
Comment From: jasone
This is likely due to the bug fixed here on the dev branch. jemalloc issue #618 is tracking the backport which will be part of the 4.5.0 release.
Comment From: rwky
If @jasone's comment is the offending issue once 4.5.0 is released if someone wants to create a redis branch with 4.5.0 in it I'm happy to test it.
Comment From: davidtgoldblatt
I've had some trouble replicating this at the commit before the suspected fix. But given that 4.5 is coming out soon, I'm included not to spend much time on it so long as you don't mind trying it out after the fix. I'll ping this issue once it's released?
Comment From: rwky
Sounds good. I had trouble replicating it without throwing real data at it, I don't know what exactly triggers it I just know reverting jemalloc fixed it. Ping me once the fix is released and I'll check it out.
Comment From: Venorcis
Jemalloc 4.5.0 is out, which probably fixes this https://github.com/jemalloc/jemalloc/releases/tag/4.5.0
Comment From: antirez
Thanks @Venorcis, kindly asking @jasone to confirm the issue is fixed since reading the changelog is not obvious what is in the list, probably locking order one, but want to be sure before upgrading. However I'll let other few weeks pass without any patch release.
Comment From: jasone
Yes, I think the issue is fixed (Fix chunk_alloc_dss() regression.). The stack trace above is consistent with test failures we experienced once @davidtgoldblatt implemented CI testing for FreeBSD.
Comment From: antirez
Thanks, we will upgrade 4.0 RC asap and the stable release with some delay. If we notice anything strange I'll write a note. Thank you!
On Mar 1, 2017 6:03 PM, "Jason Evans" notifications@github.com wrote:
Yes, I think the issue is fixed (Fix chunk_alloc_dss() regression. https://github.com/jemalloc/jemalloc/commit/adae7cfc4a2ac66c96b0dcc83b3837ac668fc44e). The stack trace above https://github.com/antirez/redis/issues/3799#issuecomment-279213828 is consistent with test failures we experienced once @davidtgoldblatt https://github.com/davidtgoldblatt implemented CI testing for FreeBSD.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antirez/redis/issues/3799#issuecomment-283401904, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEAYHe85oqjQcAOuzNxqIVr4Cv22PGWks5rhaSmgaJpZM4L8FeJ .
Comment From: rwky
@antirez if you want to create a 3.x branch with Jemalloc 4.5.0 I test if it works before you release it as a stable release.
Comment From: siyangy
@jasone We hit this exactly same problem after upgrading to jemalloc 4.6.0(4.5.0 as well) when reallocating a large chunk of memory.
Stacktrace: Program received signal SIGINT, Interrupt. 0x00000000006dfa9a in je_chunk_alloc_dss () (gdb) where
0 0x00000000006dfa9a in je_chunk_alloc_dss ()
1 0x00000000006df170 in je_chunk_alloc_wrapper ()
2 0x00000000006d273a in je_arena_chunk_ralloc_huge_expand ()
3 0x00000000006f22e5 in huge_ralloc_no_move_expand ()
4 0x00000000006f282b in je_huge_ralloc_no_move.part ()
5 0x00000000006f43e9 in je_huge_ralloc ()
6 0x00000000006d77c4 in je_arena_ralloc ()
7 0x00000000006be00f in je_realloc ()
8 0x00000000006b9bff in zrealloc ()
9 0x00000000006b9455 in sdsMakeRoomFor ()
Comment From: jasone
@siyangy, can you please specify precisely which jemalloc revision(s) you're testing with? There is no 4.6.0 release, so I want to make sure we're talking about a version that has this fix.
Assuming you're testing with at least 4.5.0, it would be really helpful to get a printout of the primary variables that impact the loop logic in chunk_alloc_dss(), namely size, alignment, max_cur, dss_next, and dss_prev.
Comment From: siyangy
Sorry we tried 4.3.1, 4.4.0 and 4.5.0 hitting the same thing, and we have verified that the fix is included for 4.5.0.
Here are the variables you asked: SIZE 2097152 ALIGNMENT 2097152 MAX_CUR 0x4022 DSS_NEXT 0x206 DSS_PREV 0xe39cd348198558bf SIZE 2097152 ALIGNMENT 2097152 MAX_CUR 0x4022 DSS_NEXT 0x206 DSS_PREV 0xe39a00b5d8d90cbf SIZE 2097152 ALIGNMENT 2097152 MAX_CUR 0x4022 DSS_NEXT 0x206 DSS_PREV 0xe396eed526b144bf SIZE 2097152 ALIGNMENT 2097152 MAX_CUR 0x4022 DSS_NEXT 0x206 DSS_PREV 0xe389f58522d98cbf SIZE 2097152 ALIGNMENT 2097152 MAX_CUR 0x4022 DSS_NEXT 0x206 DSS_PREV 0xe388069b823d53bf
One thing worth noting is that it stops printing instead of being stuck in the while loop (we put the print withing the while(true) loop) after a few iterations while the program hangs. During our previous debugging we found out that it gets calling zrealloc recursively (nested zrealloc within zrealloc).
Comment From: jasone
Wow, is that dss_prev value for real, or is there perhaps an argument missing from the printf (or is the printf placed prior to the initialization)?
Comment From: jasone
I'm having a hard time seeing how the max_cur and dss_next values could be correct either, assuming the backtrace from yesterday corresponds to when these numbers were collected. The backtrace is for in-place expanding huge reallocation, and that would require at least one chunk to have already been allocated from dss for us to get into the core of chunk_alloc_dss() (max_cur would be NULL, and we'd bail out to label_oom).
Comment From: siyangy
okay, so we use this to print
printf("SIZE %lu ALIGNMENT %lu MAX_CUR %p DSS_NEXT %p DSS_PREV %p\n",
size, alignment, max_cur, dss_next, dss_prev);
However, it is put in the very beginning of the while(true) loop, where dss_prev is not initialized. We put it there cuz we assumed that it went in infinite loop within this while(true). When we move this line after dss_prev is initialized there's nothing printed out.
**Comment From: siyangy**
@jasone After some more debugging I finally find out where the code gets stuck: there's a spin_adaptive in function chunk_dss_max_update, so that's why we didn't see a valid dss_prev and dss_next updated - it actually never gets out of the spin in chunk_dss_max_update. For some reason our stacktrace doesn't show the spin_adaptive call. The comment before spin_adaptive says 'Another thread optimistically updated dss_max. Wait for it to finish.' Apparently dss_max is not updated as expected.
**Comment From: antirez**
Just found this in the CI test, happening with Jemalloc 4.0.3 after a server restart. Not sure if it's related but looks like a Jemalloc deadlock at a first glance. It's worth to note that happened immediately after the Redis server was restarted so basically there is very little allocated at this time.
(gdb) bt
0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
1 0x00007fd7b109d67f in _L_lock_1081 () from /lib/x86_64-linux-gnu/libpthread.so.0
2 0x00007fd7b109d5f8 in __GI___pthread_mutex_lock (mutex=0x7fd7b0a006b0)
at ../nptl/pthread_mutex_lock.c:134
3 0x00000000004eb280 in je_malloc_mutex_lock (mutex=0x7fd7b0a006b0)
at include/jemalloc/internal/mutex.h:85
4 je_tcache_bin_flush_small (tsd=, tcache=,
tbin=0x7fd7b080d040, binind=<optimized out>, rem=100) at src/tcache.c:115
5 0x00000000004bef71 in je_tcache_dalloc_small (binind=,
ptr=<optimized out>, tcache=<optimized out>, tsd=<optimized out>)
at include/jemalloc/internal/tcache.h:376
6 je_arena_dalloc (tcache=, ptr=, tsd=)
at include/jemalloc/internal/arena.h:1271
7 je_idalloctm (is_metadata=, tcache=,
ptr=<optimized out>, tsd=<optimized out>)
at include/jemalloc/internal/jemalloc_internal.h:1005
8 je_iqalloc (tcache=, ptr=, tsd=)
at include/jemalloc/internal/jemalloc_internal.h:1029
9 ifree (tsd=, tcache=, ptr=)
at src/jemalloc.c:1745
10 je_free (ptr=0x7fd7b0227a70) at src/jemalloc.c:1839
11 0x00000000004320dd in sdsfreesplitres (tokens=tokens@entry=0x7fd7b02c3f00,
count=<optimized out>) at sds.c:851
12 0x000000000046df16 in clusterLoadConfig (filename=)
at cluster.c:269
13 0x000000000046fb06 in clusterInit () at cluster.c:440
14 0x000000000042f600 in initServer () at server.c:1911
15 0x0000000000423473 in main (argc=, argv=0x7ffcd4163438)
at server.c:3772
```
Comment From: bhuvanl
observed similar stack server struck (on of thread spinning with 100% CPU), while running redid-server version Redis 3.9.102 (00000000/0) 64 bit (gdb) info stack
0 0x00007ffff77411dd in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
1 0x00000000004bfdbd in je_malloc_mutex_lock (arena=, chunk=0x7fff2e400000, ptr=0x7fff2e4aee80) at include/jemalloc/internal/mutex.h:85
2 je_arena_dalloc_large (arena=, chunk=0x7fff2e400000, ptr=0x7fff2e4aee80) at src/arena.c:2602
3 0x000000000042e7fa in sdsRemoveFreeSpace (s=0x7fff2e4aee85 "") at sds.c:265
4 0x00000000004295ca in clientsCronResizeQueryBuffer (c=0x7fffd43c6000) at server.c:823
5 0x000000000042b19f in clientsCron () at server.c:863
6 0x000000000042b34e in serverCron (eventLoop=, id=, clientData=) at server.c:1016
7 0x00000000004234fd in processTimeEvents (eventLoop=0x7ffff0e360a0, flags=3) at ae.c:322
8 aeProcessEvents (eventLoop=0x7ffff0e360a0, flags=3) at ae.c:423
9 0x000000000042368b in aeMain (eventLoop=0x7ffff0e360a0) at ae.c:455
10 0x000000000042be2b in main (argc=, argv=0x7fffffffe438) at server.c:3739
Comment From: antirez
@bhuvanl 3.9.102 is using the Jemalloc that is known to have issues, RC3 downgraded to Jemalloc 4.0.3. However strange enough, I got the hang above with 4.0.3 as well in the unstable branch. Never happend in the past AFAIK, not sure if maybe the make distclean was not perfored correctly by the CI or what else to think.
Comment From: bhuvanl
@antirez Seems like using RC3 resolved my issue, thanks for quick response.
Comment From: jasone
I've been poking at this issue off and on for the past few days, and none of the scenarios I can think of that blame to jemalloc seem plausible:
- jemalloc isn't bootstrapped.
- Some code outside jemalloc is adjusting the break by calling sbrk() with a negative argument (see https://github.com/jemalloc/jemalloc/issues/802).
Note that as configured by redis, jemalloc does not resort to sbrk() unless mmap() fails. I can't quite put together a sequence of invalid calls that would corrupt jemalloc's data structures such that we'd see such behavior, but it is certainly worth verifying that these are valid attempts to use dss. We can potentially do this two different ways with relatively little effort:
- Configure jemalloc with --enable-debug, in which case assertions will almost certainly detect misuses that could lead to the failure mode.
- Configure jemalloc with --with-malloc-conf=dss:disabled, which completely shuts down use of sbrk(). This may just mask the issue, but it could also cause an easier-to-diagnose alternative failure mode.
If I had a live debugger session for the failure, it would probably be pretty quick work to figure out what's going on, but I'm out of ideas based on the incomplete evidence gathered so far.
Regarding the two stack traces recently posted with je_malloc_mutex_lock() in them, those failures look consistent with application-induced corruption, but they may well be completely unrelated to the older je_chunk_alloc_dss() issue.
Comment From: antirez
Hello @jasone, thank you very much for the help with this issue, I realize that it is possible that hangs are potentially Redis bugs and not Jemalloc bugs, so your willingness to analyze the issue nonetheless is very welcomed here.
I want to make sure I understand what you said here in order to start looking myself for bugs, so here are a few questions I hope you could answer:
- I understand that the original issue here was fixed in Jemalloc 4.5.0, is that correct? The OP @rwky apparently was able to toggle the problem on/off just reverting or re-enabling the commit with the Jemalloc upgrade, and your initial comment says that the deadlock looks consistent with the bug fixed.
- However, the successive hang reports (but the last two), you said it should not be due to Jemalloc but they are bugs inside Redis and/or something external calling sbrk(). This sbrk() call could just be that Redis is linked with libraries that are using malloc() from libc, so that there is a mix, in the same process, of calls to libc malloc and jemalloc?
- The other bugs that look like corruptions in Redis, at least the one I reported, is plausible, I'll investigate but I remember a recent fix there, maybe we are double-freeing or alike.
- The
--enable-debugoption has serious speed penalty or is something I could enable in Redis normally?
Thank you, btw if my CI triggers the same bug again, I'll not stop the process and provide you with SSH access in case you want to perform a live debugging session.
Comment From: jasone
- (1) As far as I can tell, the original issue was fixed in jemalloc 4.5.0, and although it's possible additional issues remain, I cannot come up with any plausible explanations for how jemalloc could be messing up.
- (2) If some other code besides jemalloc is concurrently calling sbrk(), it could in theory open up a race condition, but I don't think that's possible in (always single-threaded?) redis. If the system allocator is somehow being called in some cases, that would certainly cause serious problems.
- (4)
--enable-debugisn't ideal for performance-sensitive production deployment, but as long as you specify an optimization flag to the compiler, it's fast enough for most development uses. This is similar to how development versions of FreeBSD are built, and Facebook uses--enable-debugfor most of its non-ASAN development/test builds.
Comment From: antirez
Thanks @jasone, this is starting to get interesting:
- Ok thanks. For Redis 4.0, would you kindly suggest a version that is currently the safest? I went back to 4.0.3, but perhaps given that old code is safe in certain regards but also has its issues for lack of updates, maybe 4.6.0 is a better pick?
- Redis was never totally single threaded but now it is even less so... We definitely have malloc/free inside threads with Jemalloc. And... in the main thread, we have code that calls the system allocator (that is, the Lua interpreter, and potentially other stuff). So perhaps we should move Lua to use Jemalloc to avoid this problem?
- Ok, looks like that at least in the
unstablebranch it makes sense for us to go for the debug mode.
Thank you
Comment From: antirez
To clarify about threads, this is what we do in other threads:
- Reclaiming of objects in backgroud (when the
UNLINKcommand is used), so imagine a busy loop calling jemalloc free(). - Slow system calls. However threads use a message-passing strategy to communicate instead of using mutexes, so the workers implementing the background syscalls will free() the messages after receiving them.
In the future we are going to use threading a lot more, I've an experimental patch implementing threaded I/O and hopefully this will get real soon or later. Also Redis Modules use threading already and will use it more as well. So the threading story is getting more complex, but that's another story just to clarify what we have / what we'll get.
Comment From: jasone
- Suggested current jemalloc version: 4.5.0 (would suggest stable-4 if Windows were a target).
- Re: allocator to use in Lua, I'd definitely suggest using the same allocator as for the rest of redis, in order to eliminate the possibility of erroneous mixed allocator use. I don't know anything about how redis utilizes Lua; the risks may be very high/low depending on details.
Comment From: antirez
Understood, thank you @jasone.
Comment From: xhochy
Note that we also see the problem reported in https://github.com/antirez/redis/issues/3799#issuecomment-295890322 in Apache Arrow using jemalloc 4.5.0: https://issues.apache.org/jira/browse/ARROW-1282
Building with --with-malloc-conf=dss:disabled avoids the hanging issue.
I started to build up a environment to reproduce this but probably won't be able to continue the work for two weeks. Any suggestion for what to look out would be very helpful.
Comment From: filipecosta90
@xhochy following up on arrow https://issues.apache.org/jira/browse/ARROW-1282 I can see that you provided a fix for jemalloc > 4.x as in https://github.com/jemalloc/jemalloc/pull/1005 correct? @redis/core-team proposing to close it if the above is correct.
Comment From: oranagra
I think we can close this right away. IIUC, the issue started when we upgraded to a new jemalloc (which made some assumptions on sbrk that were later improved?), but since then we upgraded again several times (AFAIK the current version we're using doesn't use sbrk at all anymore).
The problem is probably caused by the fact our Lua lib uses libc allocator while the rest of redis uses jemalloc, and that's still true, but hopefully it doesn't cause any serious issues anymore. (we can change that if we want, but there are some disadvantages, like causing false sense of fragmentation that the defragger won't be able to fix)
It could be that some distros are building redis to use an external allocator (not the one embedded into redis), and who knows which jemalloc version is being used, so it could still be 4.4.0. but also, even if we change something in the next redis version, these changes will not make a difference since these users may also be still using redis 3.2.