The problem

In some cases hash keys (and values) can be the same from one hash to an another.

For example : HMSET user:1 name John age 13 HMSET user:1 name James age 25 HMSET user:1 name William age 48

In redis for each hash the key names will be store separately and will take more memory.

Feature

The idea is to have a global dictionary (like a set) and when an hash field is write, the field name can point to this dictionary value.

This feature can be enable in the redis configuration file (disabled by default).

If enable some command can be used :

DADD <value> [<value>]... to add values in the dictionary

DREM <value> [<value>]... to remove values from the dictionary. Values can be removes only if no datastructures point to it.

DSTATS [<value>]... return number things pointing to values.

Comment From: zuiderkwast

Small hashes are stored compactly in memory. A short string like "name" or "age" only has one byte of overhead to store it's size, so 5 bytes for "name" and 4 bytes for "age". A small integer key or value only takes 1-2 bytes.

Storing a pointer takes 8 bytes on a 64 bit architecture or 4 bytes on a 32 bit architecture. In addition, some extra bit (or byte) would need to be stored to indicate that it's a pointer rather than an in-place string. Thus, it's better to store the short strings directly than to store a pointer.

If you agree, I think this issue can be closed.

If you're interested in this very fascinating encoding scheme, take a look at src/t_hash.c and src/ziplist.c.

Comment From: oranagra

in addition to what @zuiderkwast said, i want to note that we would not want to add a specific API (command) or even a configuration for this, redis should aim to do the right thing by default, and in some cases offer a threshold tunable compromise between memory and CPU (like hash-max-ziplist-entries).

I think the best thing would be for the application to optimize these repetitions to shorter strings or IDs (like "name"=1, "age"=2), this will reduce the extra lookup that this issue suggests, will still be able to get ziplist encoded, and even reduce the network usage.