Hi,

We would like to propose enabling Redis for a pluggable and modular compression for RDB instead of the current hard-coded LZF compression implementation. Compression would be placed behind a well-defined interface and new compression scheme and algorithm could be used without further changes to Redis Core. The change will not only extend the current behavior but will also make Redis ready to accept future compression schemas.

Big Data Databases like Cassandra, HBase etc. have also already adopted similar concept for compression feature.

Please let me know what you think.

Thanks! Fatima

Comment From: hpatro

I think this will be a good feature addition.

LZ4 compression seems to provide similar compression ratio with much faster compress/decompress rate. https://github.com/lz4/lz4

WDYT @oranagra @madolson ?

Comment From: oranagra

we can certainly do that, maybe as part of a bigger effort to make other things extensible (like disk / file access, same as we did for sockets). but please note a few things. 1. redis currently uses compression for both rdb format (compressing while saving), and also holding some compressed data in ram (quicklist). 2. changing the compression format of the rdb file (by loading a module) would obviously make the rdb files incompatible with other redis instances (who don't have that module).

Comment From: madolson

I think it's probably worth adopting LZ4 into the project. It's rather small and provides some small performance and density improvements. As part of that work, we could look into making the compression extensible?

Comment From: hpatro

@oranagra changing the compression format of the rdb file (by loading a module) would obviously make the rdb files incompatible with other redis instances (who don't have that module).

Yeah I was thinking about the same. This feature/compression-type should be enabled by admin only after the entire cluster is upgraded to the supported version. The other thought which I had which is only applicable for replication is to fallback to lzf compression on receiving error about the new compression method.

@madolson It's rather small and provides some small performance and density improvements

I did try integrating lz4 (10 MB additional package size for lz4 subproject commit ecf92d0897587c0f641df9db83c910fd236cb18a) with Redis. The redis binary size increases from 2671296 to 2772440 bytes.

The performance improvement was really good with dummy redis-benchmark data.

Data generation:

src/redis-benchmark -t set -n 1000000 -q -d 20000 -r 1000000

Performance:

Compression Type BGSAVE time (sec) RDB size (Mb) load time (sec)
LZ4 1089 8612.88 18.918
LZF 1251 12405.38 32.34

This experiment was done on my Mac (2.6 GHz 6-Core Intel Core i7), could have noise but gives a ballpark about the improvements we could achieve.

Comment From: madolson

@oranagra @yossigo Do you guys have any concerns importing LZ4 into the project? It's both faster and better than LZF, so it seems like it's basically free outside of the extra code complexity of needing the backwards compatibility of LZF. We can independently evaluate adding more extensibility as we do the work.

Comment From: oranagra

I'm okay with changing the compression with a better and having rgb format change, but I don't wanna have it configurable and I don't wanna keep changing it again and again, so we need to switch only if we think the one we use now is outdated and the new one is here to stay. Also, looks like LZ4 code base quite large

Comment From: madolson

I'm okay with changing the compression with a better and having rgb format change, but I don't wanna have it configurable and I don't wanna keep changing it again and again, so we need to switch only if we think the one we use now is outdated and the new one is here to stay.

I agree.

Also, looks like LZ4 code base quite large

I believe we just need the .c/.h file, https://github.com/lz4/lz4/blob/dev/lib/lz4.c.

Comment From: oranagra

yes, still some 3500 compared to some 500 we have today, but i guess it's still ok.