Currently both keys and values in the main dictionary are sitting behind pointers (let's talk about general case and forget about other union types for a moment), wasting 16 bytes of memory per entry. Here is existing dict entry layout (minus metadata, that will be gone once https://github.com/redis/redis/issues/10589 is landed):

struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next; 
};

In case of a main dictionary, key is always an sds string and value is a redisObject that holds a pointer to an actual value or contains embedded sds value in case if it's short enough (see https://github.com/redis/redis/blob/unstable/src/object.c#L121-L121):

struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:24; 
    int refcount;
    void *ptr;
};

The proposal is to move towards an entry format with embedded keys and values inside of a byte array (we can embed next pointer too later, but I suggest we keep it out of scope for now for simplicity):

struct embeddedDictEntry {
    struct dictEntry *next; 
    unsigned char data[];
};

This approach isn't something radically new, and in fact we have similar entry structure in the raxNode already.

Proposed data array layout (simplified):

[Entry metadata|sds key|embedded robject]

Some challenges and considerations: * Keys are immutable, once embedded they will never change after entry creation. * Dict functions for key/value access should not change their semantics (except setKey will not be needed/used in the main dict). * Value mutability presents certain challenges, especially for embedded strings, but they are solvable with entry reallocation. * We want to support additional optional fixed sized object metadata for various extensions. * We should decide if there should be a size limit up to which keys/values should be embedded. * Entry creation process will have to be slightly changed to avoid redundant robject copying.

Thanks to @zuiderkwast's changes https://github.com/redis/redis/commit/c84248b5d2fba04e5056bad32a5d29a5fc906e8c and https://github.com/redis/redis/commit/f3f6f7c0d66f136146a912e06c8fbe31ecfbc977 dictEntry is now abstracted away and direct access is eliminated, so we can easily add a new entry type for an entry with embedded data, reducing scope of this change purely to the main dictionary (other data structures, like hashes and sets can be kept out of scope for now, and can be tackled later if needed).

On top of memory savings, entries with both keys and values embedded should reduce amount of random memory access and have some performance benefits.

Any feedback is welcome.

Note: Initial discussion on this topic has been had in https://github.com/redis/redis/issues/10802 and first step has been implemented as part of solution to https://github.com/redis/redis/issues/10589.

Comment From: zuiderkwast

I have already experimented with embedding the key in dictEntry in #9464.

The drawback is that it breaks the dict abstraction. Remember that dict is used for many other things than the keyspace.

I think a better approach is the one outlined by @madolson in #10802 under "One possible alternative" which puts the key, value, expire and other metadata inside the robj struct and uses a rax for the expire feature (seems to be inspired by the SegCache paper), but I want to suggest the following modifications:

  • ~~Use an open addressing scheme~~
  • ~~Promote the Redis object to be the dictEntry~~
  • Use the "no-value" dict feature that we already have for sets since #11595. This eliminates the dictEntry when there is no collission. Instead, the hash table points directly to the key, which is the redisObject in this case.

Comment From: judeng

I am very much looking forward to such a change, which will save us a lot of cost. Some additional questions: 1. We need to consider saving the expiration time in metadata, and maintaining two different dictentries in the db will be very complicated and expensive 2. Metadata can consider the next pointer as an option. Our rehash factor is 1, and most buckets will not collide. The 8-byte next pointer is too expensive 3. Will the shared object no longer be used?This will bring about a simplification of the eviction 4. val may be updated frequently, which I suspect that may cause performance degradation

Comment From: vitarb

@zuiderkwast I think we can do it without breaking any abstractions, except slight change in the contract, as ownership model changes, consider this draft PR that I've created https://github.com/redis/redis/pull/12221 for key embedding, it's slightly different from what you've done before. Let me know what you think.

@judeng, let me try to answer your questions to the best of my ability:

We need to consider saving the expiration time in metadata, and maintaining two different dictentries in the db will be very complicated and expensive

I thought about this, and it can be a good incremental improvement, although I think it would have to remain configurable feature, as there are use cases that will be degraded if we embed TTLs (for example if only small fraction of entries have expiration date, then sampling performance will be worse), but at the same time it can be beneficial for users that heavily rely on expiration and have it for majority of items. I suggest we keep this out of scope for this change for now.

2. Metadata can consider the next pointer as an option. Our rehash factor is 1, and most buckets will not collide. The 8-byte next pointer is too expensive

I totally aggree, we can save next pointer in the metadata and just have a flag indicating that it has been embedded saving 8 bytes for all entries that don't have a pointer. Key insight here is that we know that entry needs to have a next pointer at the time of creation (bucket empty/not empty pre-insert). One key challenge is solving rehashing, which would probably require a fair share of reallocs.

3. Will the shared object no longer be used?This will bring about a simplification of the eviction

Are you talking about shared sds that is used as a key in the expiry dictionary? Please clarify.

4. val may be updated frequently, which I suspect that may cause performance degradation

This is only going to be a concern for values that are embedded into robject. I would probaly keep similar behavior as today and only embed small values, while referencing bigger values. Hence real benefit from value embedding would come from robject embedding into data array and saving one pointer. We should consider different options too.

Comment From: madolson

I thought about this, and it can be a good incremental improvement.

I'm all for incremental feature delivery, but the design should not be incremental. We need to understand the long term solution, and whether or not we are going through decisions that would be difficult to unwind.