Redis [QUESTION] is element-tot-len itself's length included in the total entry length in listpack?

Here is the documentation of listpack.

From the documentation, the structure of a listpack entry is as following

<encoding-type><element-data><element-tot-len>
|                                            |
+--------------------------------------------+
            (This is an element)

The following content is from the documentation

Let's take for example a very simple entry encoding the string "hello":

"\x45hello" -- The string "hello"

But the documentation make a mistake, from this issue, we can know the entry hello encoding should be

\x85hello

The following content is from the documentation:

The raw entry is 6 bytes: the encoding byte followed by the raw data. In order for the entry to be complete, we need to add the entry length field at the end, that in this case is just the byte "06". So the final complete entry will be:

"\x45hello\x06" -- A complete entry representing "hello"

As I said above, \x45 should be \x85, so the total entry of hello should be \x85hello\x06，but I think this is not right too.

The problem is, if "element-tot-len itself's length included in the total entry length in listpack"?

I think it should be yes, but if it is yes, let's take a look of the structure:

encoding-type: 1Byte
element-data: 5Bytes(hello) + '\0' = 6Bytes
element-tot-len: 1Byte (7bit can represent up to 127)

The total entry length should be: 1 Byte encoding-type + 6 Bytes element-data + 1 Byte element-tot-len = 8 Bytes.

So the final complete entry should be

\x85hello\x08

Am I wrong anywhere or the documentation make a mistake?

Comment From: oranagra

/* If 'p' points to an element of the listpack, calling lpPrev() will return
 * the pointer to the previous element (the one on the left), or NULL if 'p'
 * already pointed to the first element of the listpack. */
unsigned char *lpPrev(unsigned char *lp, unsigned char *p) {
    assert(p);
    if (p-lp == LP_HDR_SIZE) return NULL;
    p--; /* Seek the first backlen byte of the last element. */
    uint64_t prevlen = lpDecodeBacklen(p);
    prevlen += lpEncodeBacklen(NULL,prevlen);
    p -= prevlen-1; /* Seek the first byte of the previous entry. */
    lpAssertValidEntry(lp, lpBytes(lp), p);
    return p;
}

looks like the bytes used to store the backlen (element-tot-len) are NOT included in the backlen itseof. hence prevlen += lpEncodeBacklen(NULL,prevlen);

i'm sorry if i missed something in your post (didn't read it carefully), if there are contradictions in the documentation, i'll always assume the code is correct (it passed tests, and the docs did not).

btw, i'm curious, what are you building? i noticed there's currently a vacuum around parsing rdb files because none of the existing tools (e.g. RIOT and redis-rdb-cli, etc) are capable of parsing listpacks.

we do have a plan to add a librdb.so that will be bundled inside redis some day, but maybe some tools will benefit from a java or python based implementation.

Comment From: sundb

@xiebruce element-data: 5Bytes(hello) + '\0' = 6Bytes isn't correct, listpack does not save an extra \0.

Comment From: xiebruce

c /* If 'p' points to an element of the listpack, calling lpPrev() will return * the pointer to the previous element (the one on the left), or NULL if 'p' * already pointed to the first element of the listpack. */ unsigned char *lpPrev(unsigned char *lp, unsigned char *p) { assert(p); if (p-lp == LP_HDR_SIZE) return NULL; p--; /* Seek the first backlen byte of the last element. */ uint64_t prevlen = lpDecodeBacklen(p); prevlen += lpEncodeBacklen(NULL,prevlen); p -= prevlen-1; /* Seek the first byte of the previous entry. */ lpAssertValidEntry(lp, lpBytes(lp), p); return p; }

looks like the bytes used to store the backlen (element-tot-len) are NOT included in the backlen itseof. hence prevlen += lpEncodeBacklen(NULL,prevlen);

i'm sorry if i missed something in your post (didn't read it carefully), if there are contradictions in the documentation, i'll always assume the code is correct (it passed tests, and the docs did not).

btw, i'm curious, what are you building? i noticed there's currently a vacuum around parsing rdb files because none of the existing tools (e.g. RIOT and redis-rdb-cli, etc) are capable of parsing listpacks.

we do have a plan to add a librdb.so that will be bundled inside redis some day, but maybe some tools will benefit from a java or python based implementation.

From the lpPrev() function, you are right, element-tot-len is the length of encoding-type plus the length of element-data. Actually I'm not working on anything, I just learning🤣

But what I don't understand is, it seems this line uint64_t prevlen = lpDecodeBacklen(p) use the first backlen byte as the backlen(element-tot-len), but backlen(element-tot-len) can be more than 1 byte, it can be up to 5 bytes.

let's assume p points to entry2, now we want to retrieve the previous entry, that is entry1. As I said above, element-tot-len can be up to 5 bytes, so we assume it is 5 bytes, p-- means p points to the last byte(or the "first byte" on the right side) of element-tot-len of entry1 right? Redis [QUESTION] is element-tot-len itself's length included in the total entry length in listpack?

But these 3 lines seems treat the last byte (or the "first byte" on the right side) of element-tot-len as the total element-tot-len, the left 4 bytes seems not being used. so I don't understand, I'm not familiar with C, I don't know if I'm wrong somewhere?

p--; /* Seek the first backlen byte of the last element. */
uint64_t prevlen = lpDecodeBacklen(p);
prevlen += lpEncodeBacklen(NULL,prevlen);

Comment From: xiebruce

@xiebruce element-data: 5Bytes(hello) + '\0' = 6Bytes isn't correct, listpack does not save an extra \0.

oh, I thought it has an extra \0 just as ziplist did, thank you for reminding me.

Comment From: sundb

@xiebruce You need to understand first how backlen is stored, it is dynamic(at most 5 bytes), we determine the actual number of bytes it occupies by determining the highest bit. When < 127, it is only 1 byte(0XXXXXXX) When > 127 and < (2^14 - 1), it will be 2 bytes(0XXXXXXX 1XXXXXXX)

oh, I thought it has an extra \0 just as ziplist did, thank you for reminding me.

ziplist also doesn't store `\0`.

Comment From: xiebruce

@xiebruce You need to understand first how backlen is stored, it is dynamic(at most 5 bytes), we determine the actual number of bytes it occupies by determining the highest bit. When < 127, it is only 1 byte(0XXXXXXX) When > 127 and < (2^14 - 1), it will be 2 bytes(0XXXXXXX 1XXXXXXX)

oh, I thought it has an extra \0 just as ziplist did, thank you for reminding me.

ziplist also doesn't store `\0`.

Actually I know how backlen is stored, it uses 7bit of every byte, because every msb is used as a signal to point out if there are more bytes on the left side. - 1 byte, 7bit, 2^7-1 = 127 (0-127) - 2 byte, 7x2=14bit, 2^14-1 = 16383(128 - 16383) - 3 byte, 7x3=21bit, 2^21-1 = 2,097,151 (16384 - 2,097,151) - 4 byte, 7x4=28bit, 2^28-1 = 268,435,455 (2,097,152 - 268,435,455) - 5 byte, 7x5=35bit, 2^35-1 = 34,359,738,367 (268,435,456 - 34,359,738,367)

Now I understand, this function lpDecodeBacklen(p) will calculate the actual bytes that the backlen occupies. Thank you so much!