We have been seeing this issue in production that happens after a power outage.
Redis doesn't come back online because of corruption in the aof file. Even with the config aof-load-truncated yes
11833:M 23 Jul 07:50:58.426 * <ft> Initialized thread pool!
11833:M 23 Jul 07:50:58.426 * Module 'ft' loaded from /solink/run/redisearch.so
11833:M 23 Jul 07:50:58.426 * Reading RDB preamble from AOF file...
11833:M 23 Jul 07:50:58.426 * Reading the remaining AOF tail...
11833:M 23 Jul 07:50:59.099 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
Then this is output of of check aof.
The AOF appears to start with an RDB preamble.
Checking the RDB preamble to start:
[offset 0] Checking RDB file appendonly.aof
[offset 26] AUX FIELD redis-ver = '4.0.9'
[offset 40] AUX FIELD redis-bits = '64'
[offset 52] AUX FIELD ctime = '1532140131'
[offset 67] AUX FIELD used-mem = '992032'
[offset 83] AUX FIELD aof-preamble = '1'
[offset 85] Selecting DB ID 0
[offset 488] Checksum OK
[offset 488] \o/ RDB looks OK! \o/
[info] 2 keys read
[info] 0 expires
[info] 0 already expired
RDB preamble is OK, proceeding with AOF tail...
0x 119fad2: Expected prefix '*', got: '
AOF analyzed: size=18480421, ok_up_to=18479826, diff=595
AOF is not valid. Use the --fix option to try fixing it.
Issue is originally reported here https://github.com/RedisLabsModules/RediSearch/issues/394#issuecomment-408207387
Comment From: antirez
Hello, this usually happens with specific kernel parameters for the Linux kernel filesystem. If the mount options regarding data safety of ext4 or other filesystems are not strict enough, the metadata of the file could be updated before the file page itself is flushed on disk. This results in zero-padding of the file.
For your convenience here there is something from the kernel documentation:
data=journal All data are committed into the journal prior to being
written into the main file system. Enabling
this mode will disable delayed allocation and
O_DIRECT support.
data=ordered (*) All data are forced directly out to the main file
system prior to its metadata being committed to the
journal.
data=writeback Data ordering is not preserved, data may be written
into the main file system after its metadata has been
committed to the journal.
Comment From: antirez
For people reading this thread, note that in the original issue the user reported that the file was filled with zeroes at the end.
Comment From: antirez
Would be good if @JefStat could acknowledge the filesystem was mounted with data=writeback.
Comment From: JefStat
Filesystem is mounted as ordered
/dev/mapper/cachedev1 /share/CACHEDEV1_DATA ext4 rw,relatime,(null),noacl,stripe=256,data=ordered,jqfmt=vfsv0,usrjquota=aquota.user 0 0
/dev/md13 /mnt/ext ext4 rw,relatime,nodelalloc,data=ordered 0 0
Comment From: dolfandringa
I am running into this issue too that after a power failure the aof file is padded with zeroes. I was reading the docs paster by @antirez but it's unclear to me which mode I should use to prevent it. Currently it is mounted in ordered data mode. I am using aof-load-truncated yes. Should I use data=journal?
Comment From: cwbusacker
@antirez we run into this problem as well in docker container redis:7.0.4-alpine3.16. And redis.conf:
bind 0.0.0.0
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 10
auto-aof-rewrite-min-size 64mb
notify-keyspace-events Eg$
What is the current workaround? Like @dolfandringa , it’s unclear to me what mode to change to and how to change the mode?
Comment From: itaispiegel
We're also facing this problem, and we also have data=ordered set on our filesystem. Is there any fix/workaround for this?
I wonder why isn't there an option to run redis-check-aof --fix on startup, so there won't need to be manual intervention during the process.
Maybe it could be a good idea to add a configuration to do this?
It'll be important though to document the danger of setting this - that data might be lossed.
Comment From: sundb
@itaispiegel we already have the --fix option for redis-check-aof.
Comment From: itaispiegel
@itaispiegel we already have the
--fixoption forredis-check-aof.
I know, and as I said - I think there should be an option to run it on startup, to allow the database to start in case of corruption without needing human intervention.
Comment From: sundb
IMHO, automatic fix doesn't mean that data will not lost, the users are likely to ignore the output and think everythink is ok.
Comment From: itaispiegel
IMHO, automatic fix doesn't mean that data will not lost, the users are likely to ignore the output and think everythink is ok.
That's why it should be documented well, so anyone who is setting this will be aware that they might lose data. But of course the ideal solution would be to prevent these corruptions in the first place, but as I understand we probably don't know how to do prevent them.