Hi, we are working on a race detection tool and we believe that we have found a potential race for this project. Although this race does not seem to be crucial to the everyday function of Redis, we thought it would be better to report the race just to be sure. This race was found on commit eacccd2ac at the time of writing.
Description of the Bug:
There appears to be a race condition on the server variable that is created in line 79:
79| struct redisServer server; /* Server global state */
The variable is accessed in line 656 in the aof.c file and line 351 in the aof.c file.
Thread 1 (redis/src/aof.c):
654| * positive reply about the operation performed. */
655| if (server.aof_state == AOF_ON)
>656| server.aof_buf = sdscatlen(server.aof_buf,buf,sdslen(buf));
657|
658| /* If a background append only file rewriting is in progress we want to
Stacktrace for Thread 1:
pthread_create [bio.c:121]
bioProcessBackgroundJobs [bio.c:121]
_serverPanic [bio.c:227]
printCrashReport [debug.c:997]
logCurrentClient [debug.c:1840]
decrRefCount [debug.c:1617]
freeModuleObject [object.c:369]
beforeSleep [object.c:341]
handleClientsWithPendingReadsUsingThreads [server.c:2365]
readQueryFromClient [networking.c:3656]
processInputBuffer [networking.c:2179]
processGopherRequest [networking.c:2069]
lookupKeyRead [gopher.c:53]
lookupKeyReadWithFlags [db.c:152]
expireIfNeeded [db.c:108]
checkClientPauseTimeoutAndReturnIfPaused [db.c:1529]
unpauseClients [networking.c:3313]
unblockClient [networking.c:3299]
unblockClientFromModule [blocked.c:184]
moduleFreeContext [module.c:5048]
moduleHandlePropagationAfterCommandCallback [module.c:637]
propagate [module.c:625]
feedAppendOnlyFile [server.c:3525]
Thread 2 (redis/src/aof.c):
349| mstime_t latency;
350|
>351| if (sdslen(server.aof_buf) == 0) {
352| /* Check if we need to do fsync even the aof buffer is empty,
353| * because previously in AOF_FSYNC_EVERYSEC mode, fsync is
Stacktrace for Thread 2:
pthread_create [bio.c:121]
bioProcessBackgroundJobs [bio.c:121]
_serverPanic [bio.c:227]
printCrashReport [debug.c:997]
logCurrentClient [debug.c:1840]
decrRefCount [debug.c:1617]
freeModuleObject [object.c:369]
beforeSleep [object.c:341]
flushAppendOnlyFile [server.c:2442]
Note: This race only seems to occur if the server were to crash which makes it an unlikely race to occur. However, we thought it would be better to report the race rather than to ignore it.
Here is the full race report generated by our tool:
Comment From: yossigo
@josefm9 I think this is a false positive, logCurrentClient() is never expected to handle objects of type OBJ_MODULE so we should never end in freeModuleObject().
Comment From: oranagra
i agree that a race inside a crash report is not a real concern. if the fix was trivial we can fix it, if not, i don't think we should worry, unless the noise of seeing false could mask real issues.
furthermore, i don't understand how freeModuleObject can call beforeSleep?
in any case, as Yossi mentioned, the call to decRefCount in logCurrentClient releases the argc objects, these would never be anything other than strings (not module types etc).