In func raxRemove, it calls raxLowWalk with a stack to save parents.
But when oom occurs, raxStackPush will not be able to grow in size and save new node, and there is no handling of return value in raxLowWalk:
if (ts) raxStackPush(ts,h); /* Save stack of parent nodes. */
Therefore, when function raxRemove calls raxStackPop after oom, an incomplete stack may be obtained. This will omit processing of some nodes and may cause a memory leak.
if (h->size == 0) {
debugf("Key deleted in node without children. Cleanup needed.\n");
raxNode *child = NULL;
while(h != rax->head) {
child = h;
debugf("Freeing child %p [%.*s] key:%d\n", (void*)child,
(int)child->size, (char*)child->data, child->iskey);
rax_free(child);
/
rax->numnodes--;
h = raxStackPop(&ts);
/* If this node has more then one child, or actually holds
* a key, stop here. */
if (h->iskey || (!h->iscompr && h->size != 1)) break;
}
What makes me more confused is that there is a judgment about oom in the code later to avoid using an incomplete stack.
/* Don't try node compression if our nodes pointers stack is not
* complete because of OOM while executing raxLowWalk() */
if (trycompress && ts.oom) trycompress = 0;
Comment From: sundb
In func raxRemove, it calls raxLowWalk with a stack to save parents. But when oom occurs, raxStackPush will not be able to grow in size and save new node, and there is no handling of return value in raxLowWalk:
if (ts) raxStackPush(ts,h); /* Save stack of parent nodes. */Therefore, when function raxRemove calls raxStackPop after oom, an incomplete stack may be obtained. This will omit processing of some nodes and may cause a memory leak.
there's no memory leak, it's just that when a node is deleted, some nodes might not be cleaned up, i.e. a node doesn't have any children anymore, but it's still there.
btw, this only affects users using rax as dependance, but not redis, because redis oom is taken over by zmalloc_default_oom.
What makes me more confused is that there is a judgment about oom in the code later to avoid using an incomplete stack.
/* Don't try node compression if our nodes pointers stack is not * complete because of OOM while executing raxLowWalk() */ if (trycompress && ts.oom) trycompress = 0;
Because ts is incomplete, the parent-child relationship is actually unclear, so it can't be compressed.
Comment From: wudilun123
Assume that the current tree structure is a->b->c->d->e, and you want to delete the string "abcde." After executing this code, b c d e is cleaned up normally:
if (h->size == 0) {
debugf("Key deleted in node without children. Cleanup needed.\n");
raxNode *child = NULL;
while(h != rax->head) {
child = h;
debugf("Freeing child %p [%.*s] key:%d\n", (void*)child,
(int)child->size, (char*)child->data, child->iskey);
rax_free(child);
rax->numnodes--;
h = raxStackPop(&ts);
/* If this node has more then one child, or actually holds
* a key, stop here. */
if (h->iskey || (!h->iscompr && h->size != 1)) break;
}
But if you get an incomplete stack after oom, only a & b in the stack. In this case, b &e will be cleaned up after the above code executed, c & d are still there and we lose the reference to node c (it has no chance to free).
Although it doesn't affect the use of redis, I think a possible solution is to move the oom judgment to the beginning of raxRemove, like:
int raxRemove(rax *rax, unsigned char *s, size_t len, void **old) {
...
size_t i = raxLowWalk(rax,s,len,&h,NULL,&splitpos,&ts);
if (ts.oom || i != len || (h->iscompr && splitpos != 0) || !h->iskey) {
raxStackFree(&ts);
return 0;
}
Comment From: sundb
@wudilun123 If the structure is a->b->c->d->e, then the abc and abcd nodes exist and are not deleted after deleting abcde. If they don't have a key, the structure would be head -> abcd
Comment From: wudilun123
Thank you for your reply~ If the structure likes this, abc is a key but ab is not:
(gdb) call raxShow($r)
[a] -> [bd]
`-(b) [c] -> []=(nil)
`-(d) []=(nil)
Now call raxRemove to delete node abc, assume that we get an incomplete stack without node ab. Func raxStackPop will return node a after freeing node abc, which results in a mismatch between parent and child. After that, we call raxRemoveChild and get stuck in a dead loop, because parent node a doesn't have child node abc:
/* 2. Search the child pointer to remove inside the array of children
* pointers. */
while(1) {
raxNode *aux;
memcpy(&aux,c,sizeof(aux));
if (aux == child) break;
c++;
e++;
}
Comment From: sundb
@wudilun123 you're right, in theory it's possible. maybe you can make a unittest to test it if you want. btw, the default static stack size if RAX_STACK_STATIC_ITEMS(32), so to test it we need to fake a rax that go deeper than 32, then manually randomize to make it go OOM.