Redis [NEW] AOF persistence - Nineya|java/go/python

The problem/use-case that the feature addresses

In fields such as finance and e-commerce, the pursuit of high throughput comes with certain risks when using Redis, as enabling always for AOF persistence cannot guarantee that data will not be lost. We hope to add an additional option to ensure that each successfully executed command is saved.

Description of the feature

After enabling the new drop disk option in AOF, it can be ensured that each successful instruction has been flushed.

Comment From: sundb

you can use config appendfsync always to fsync after every write to the append only log. however, there is no guarantee that data won't be lost, such as hardware failure or power.

Comment From: BarackYoung

you can use config appendfsync always to fsync after every write to the append only log. however, there is no guarantee that data won't be lost, such as hardware failure or power.

yeah，So I suggest adding an option to ensure no loss。

Comment From: sundb

@BarackYoung no database can guarantee that it won't lose data, only that it will lose as little as possible, as I suggested using appendfsync always, which writes to the aof file after each command is executed, but we can still lose the last command if the server shutdown due to power outage.

Comment From: BarackYoung

@BarackYoung no database can guarantee that it won't lose data, only that it will lose as little as possible, as I suggested using appendfsync always, which writes to the aof file after each command is executed, but we can still lose the last command if the server shutdown due to power outage.

I mean, like mysql or kafka, when the command or sql execute finished and return success, it guarantee that it won't lose data because the data has bean writeen to the disk corectlly. So I'm wondering if redis can support such feature.

Comment From: sundb

If the premise is when the command or sql execute finished and return success, appendfsync always already guarantees that the data will be writeen to the disk corectlly. Without this premise, mysql and kafka have no guarantee of data loss, and any physical failure between command execution and disk write can result in data loss.

Comment From: BarackYoung

If the premise is when the command or sql execute finished and return success, appendfsync always already guarantees that the data will be writeen to the disk corectlly. Without this premise, mysql and kafka have no guarantee of data loss, and any physical failure between command execution and disk write can result in data loss.

I don't think so. Mysql can guarantee the data won't be lose by redo log. If sql execute success and the computer failover happened, the data will be rewrite to disk corectlly. But Redis appendfsync always don't sync data to disk every command but every eventloop. If the command execute finished and return success but the eventloop not finish yet because of server power off, the business think it's has done correctly and continue to do the task such as delivery, the inconsistency happened. So there are no guarantee of data loss even thogh the command execute finished and return success.So we have to use other components to guarantee the data not loss such as kafka and mysql.

So I think we can add a new appendfsync option to make sure the data sync to disk correctlly when the command return success. Writing to a disk sequentially like kafka has very high throughput, and I think it can be considered.

Comment From: sundb

AFAIK, Redis wites to aof before replying to clients, ref the following code:

void beforeSleep(struct aeEventLoop *eventLoop) {
    ....
    handleClientsWithPendingReadsUsingThreads(); <- read and process commans
    ....

    if (server.aof_state == AOF_ON || server.aof_state == AOF_WAIT_REWRITE)
        flushAppendOnlyFile(0); <- fsync if aof is enable

    handleClientsWithPendingWritesUsingThreads(); <- write to client
    ....
}

Comment From: godjoem

‌‌‌‌‌‌‌Hi sir @sundb, can I understand this way: if I turn on AOF and set appendfsync always, Redis will write every write operation to disk before returning the result to the client, ensuring that data from all successfully executed operations will not be lost even there is a sudden power outage.

Comment From: sundb

@godjoem yes. you can also read the doc in https://redis.io/docs/latest/operate/oss_and_stack/management/persistence/

appendfsync always: fsync every time new commands are appended to the AOF. Very very slow, very safe. Note that the commands are appended to the AOF after a batch of commands from multiple clients or a pipeline are executed, so it means a single write and a single fsync (before sending the replies).

Comment From: godjoem

Thank you @sundb for the quick response. One more question I'd like to request: What’s the best way to test the performance impact of having appendfsync always enabled using redis-benchmark? I attempted to run the test on a Redis sentinel both with and without appendfsync always enabled. redis-benchmark -h 10.3.80.156 -p 6386 --threads 4 -c 100 -n 100000 --csv

The result: 1. appendfsync always enabled:

"test","rps","avg_latency_ms","min_latency_ms","p50_latency_ms","p95_latency_ms","p99_latency_ms","max_latency_ms"
"PING_INLINE","26385.22","2.967","0.288","2.559","6.535","14.991","56.767"
"PING_MBULK","26378.27","3.109","0.352","2.463","9.191","14.495","32.159"
"SET","26546.32","3.125","0.760","2.607","6.327","11.567","24.047"
"GET","28328.61","2.845","0.296","2.087","8.583","14.663","36.991"
"INCR","30627.87","2.564","0.784","2.415","4.255","5.999","27.599"
"LPUSH","28409.09","2.764","0.712","2.511","4.887","7.479","16.575"
"RPUSH","22055.58","3.662","0.632","2.463","5.903","12.943","691.199"
"LPOP","28425.24","2.722","0.736","2.503","4.695","6.751","20.415"
"RPOP","28352.71","2.763","0.680","2.567","4.639","5.799","11.375"
"SADD","32701.11","2.499","0.304","1.663","7.455","16.255","39.391"
"HSET","24894.20","3.210","0.688","2.959","5.231","7.479","21.231"
"SPOP","32840.72","2.334","0.320","1.431","8.471","20.975","58.495"
"ZADD","32808.40","2.549","0.320","2.079","5.367","13.887","46.655"
"ZPOPMIN","32701.11","2.222","0.336","1.127","10.599","22.927","45.119"
"LPUSH (needed to benchmark LRANGE)","30646.64","2.583","0.728","2.447","4.063","5.415","24.447"
"LRANGE_100 (first 100 elements)","20631.32","3.318","0.352","1.631","12.383","19.695","55.007"
"LRANGE_300 (first 300 elements)","12297.10","4.589","0.472","2.199","14.199","24.383","43.935"
"LRANGE_500 (first 500 elements)","8664.76","5.694","0.488","2.791","21.487","27.679","265.215"
"LRANGE_600 (first 600 elements)","7652.87","6.603","0.496","3.359","17.199","26.511","62.303"
"MSET (10 keys)","28392.96","2.881","0.904","2.791","4.367","5.895","12.479"
"XADD","20968.76","4.163","0.768","2.695","4.903","6.599","1265.663"

2.appendfsync always disabled:

"test","rps","avg_latency_ms","min_latency_ms","p50_latency_ms","p95_latency_ms","p99_latency_ms","max_latency_ms"
"PING_INLINE","27987.69","2.503","0.344","1.671","5.727","20.111","53.279"
"PING_MBULK","28216.71","2.462","0.288","1.671","4.671","23.055","57.343"
"SET","30515.72","2.658","0.344","2.151","5.831","12.551","40.031"
"GET","32626.43","1.941","0.312","0.975","11.863","22.623","47.103"
"INCR","32776.14","2.273","0.304","1.447","11.087","14.887","39.423"
"LPUSH","28097.78","2.282","0.336","1.103","12.127","16.719","35.039"
"RPUSH","32701.11","2.297","0.312","1.519","7.935","18.511","40.767"
"LPOP","30147.72","2.586","0.328","1.927","5.839","12.887","35.743"
"RPOP","26295.03","2.801","0.312","1.935","10.655","16.447","36.799"
"SADD","32905.56","2.130","0.280","1.103","11.607","13.391","38.783"
"HSET","32883.92","2.282","0.344","1.591","7.599","13.759","33.791"
"SPOP","35727.05","1.849","0.296","0.903","11.759","22.639","45.471"
"ZADD","32840.72","1.973","0.344","0.975","11.943","18.367","34.367"
"ZPOPMIN","33057.85","1.920","0.288","0.935","11.783","14.295","30.239"
"LPUSH (needed to benchmark LRANGE)","27964.21","2.599","0.376","1.743","5.215","22.975","53.535"
"LRANGE_100 (first 100 elements)","23004.37","2.538","0.400","1.247","12.407","22.783","45.727"
"LRANGE_300 (first 300 elements)","12602.39","4.919","0.416","2.447","13.831","24.463","67.967"
"LRANGE_500 (first 500 elements)","9121.59","6.623","0.408","4.127","15.535","25.439","92.479"
"LRANGE_600 (first 600 elements)","7996.80","7.480","0.456","5.911","16.463","25.791","60.863"
"MSET (10 keys)","28288.54","2.769","0.408","2.487","4.911","12.783","32.543"
"XADD","28121.48","2.565","0.328","1.631","11.815","17.503","33.535"

Do you think this testing is valid? It appears that enabling appendfsync always does impact performance to some extent.

Comment From: sundb

@godjoem you can also add -P <pipe num> to avoid network overhead, which can be more accurate.

Comment From: godjoem

@sundb Got it, appreciate your kind help.

Comment From: BarackYoung

Yes, that’s exactly how it works.I see it in the code that redis will flush data to disk every eventloop. And reply in the next eventloop.So, if you received a success reply, the data has been flushed in the previous eventLoop

发自我的iPhone

------------------ Original ------------------ From: JoeM @.> Date: Thu,Aug 29,2024 10:56 AM To: redis/redis @.> Cc: BarackYoung @.>, State change @.> Subject: Re: [redis/redis] [NEW] AOF persistence (Issue #13186)

‌‌‌‌‌‌‌Hi sir @sundb, can I understand this way: if I turn on AOF and set appendfsync always, Redis will write every write operation to disk before returning the result to the client, ensuring that data from all successfully executed operations will not be lost even there is a sudden power outage.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

Comment From: godjoem

Noted with thanks @BarackYoung

Comment From: oomgomgxx

AFAIK，Redis在回复客户端之前会写入aof，参考以下代码：

```c void beforeSleep(struct aeEventLoop *eventLoop) { .... handleClientsWithPendingReadsUsingThreads(); <- read and process commans ....
if (server.aof_state == AOF_ON || server.aof_state == AOF_WAIT_REWRITE)
    flushAppendOnlyFile(0); <- fsync if aof is enable

handleClientsWithPendingWritesUsingThreads(); <- write to client
....
} ```

Can this "return means persistence success" semantics be guaranteed in Redis Cluster mode? As far as I know, Redis uses the Gossip protocol

Comment From: sundb

Can this "return means persistence success" semantics be guaranteed in Redis Cluster mode? As far as I know, Redis uses the Gossip protocol

Cluster mode can still guarantee it, Gossip is not used for data sync.