Small aggregate values such as lists and sets use an internal representation that is already serialized and arch-independent. However when we rewrite the append only file using the BGREWRITEAOF command we deserialize this values into many single-value operations. For instance a three elements list will be rendered inside the rewritten AOF as: RPUSH mylist value1 RPUSH mylist value2 RPUSH mylist value3 This is a lot of wasted work both for the AOF creation and for the AOF reloading.
Instead we should put serialized values into the AOF in the direct format using some special command, possibly only valid in the context of AOF. This will severely improve the AOF performance.
This technique is already used in the creation of RDB files.
An alternative to what proposed above would be to change format to AOF allowing for an initial section of the AOF in the form of an RDB file itself, so that we could just dump the RDB and a sequence of accumulated commands when doing the AOF rewrite operation. This is a valid alternative but with the side effect of not having an easy to process AOF.
There is definitely to think more about this but is an interesting optimization.
Comment From: hampus
An alternative to the alternative would be to make the RDB dumping and AOF cooperate more closely. I've been thinking a little about that (before I found this). It could have several advantages.
First a short overview of what I mean. We could stop doing AOF rewrites entirely and use RDB files instead. When it's time to rewrite the AOF we would instead start a new AOF file (keeping the old one) and at the same time start a background save. The RDB would contain the name of the new AOF file, so when the RDB has been saved completely we can restore the data simply by loading the latest RDB and then reading from the AOF file that it points to. As there can be several AOF files at the same time with this method, we need to number them in some way and make sure that we continue reading all the newer ones too (i.e. those with a higher number). Before the background save is completely done, we will therefore still be able to load our data by starting at an older RDB instead.
When appendonly mode is first enabled we would also save an RDB and at the same time start writing a new AOF with changes. Whenever we start a background save we could always start a new AOF at the same time (why not?), so they would be closely linked. We could safely remove old files when they are no longer needed (or allow the user to do something with them, like archiving them somewhere).
With just a little additional work we could also support point-in-time-recovery (PITR) like e.g. PostgreSQL. We just need to store a timestamp in the AOF each time we flush it (max one per second), have a timestamp in each RDB and then make it possible to load up to a certain point (select the correct RDB if several and read the AOFs up to that point). If you keep old AOFs and RDBs you would then always be able to restore data at any point in time :)
It would also be good because then there would be only a single algorithm to optimize for both background saves, slave syncs and AOF "rewrites", so everything would benefit from improvements and it would (perhaps) be easier to maintain. For this to be as useful as possible, I think it's important to make it easy to archive old AOFs and RDBs and perhaps even implement PITR as described above (PITR would be awesome! See e.g. PostgreSQL for ideas on both).
The biggest problem would be to make it easy to upgrade to this new system. That should be solvable, though.
Hope that was pretty clear. An interesting idea, at least! What do you think? The right direction for Redis persistence or better to keep them separate?
(Perhaps it would be better to post this to the ML, but it's so very related to this. Would be awesome to get the issues to the mailing list too again. Might be possible some way.)
Cheers, Hampus
Comment From: oranagra
This issue is describing preamble rdb, which already exists in recent versions.