Redis [BUG] Redis doesn't check a return code of fsync thus potentially corrupting keys/values and silently returning them.

According to researchers Anthony Rebello, Yuvraj Patel, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau of University of Wisconsin - Madison, Redis doesn't check a return code of fsync system call.

They state that under certain conditions it may lead to corrupted keys or values being obliviously returned.

Link to the paper: https://www.usenix.org/system/files/atc20-rebello.pdf

Comment From: madolson

Yeah, when fsync fails it usually means you have no idea what has happened underneath on the system. For RDBs, we should be aborting the snapshot and trying again. I'm less sure what we should do about AOFs though, that seems to be the place we are generally ignoring the fsync calls.

Comment From: chemist777

I think we should crash Redis (by default) in case of any fsync failure as it was done in Postgres.

Comment From: itamarhaber

Hello @whateverpal

Thanks for sharing this.

Comment From: oranagra

i'm not sure we want to crash, maybe better set aof_last_write_status and/or lastbgsave_status to C_ERR, which will cause redis to stop accepting write commands. users may still be able to salvage their data.

Comment From: madolson

I agree that we shouldn't crash. In Postgres the authoritative state is on disk, so an fsync means you don't know what happened so you should reload the log. In redis, the state is in memory, so we should be able to continue. I don't know much about aof, we could also consider starting a new file.

Comment From: yossigo

I agree with @oranagra and @madolson, we should treat a failed fsync as a write error leaving the currently opened file in an undetermined state, but it may be recoverable.