The hyperloglog now is using string to store data, but that cause many problems:
1. no limit on mix using hyperloglog and string commands on the same key
for example, setbit setrange can break hyperloglog data.
2. have to propagate pfcount, a read-only command, it's very odd
Make a new datatype for hyperloglog can sort out all the mess above, but at the time it is a breaking change, see below: 1. unable to mix use hyperloglog and string (not a big deal, mix use is a wrong usage) 2. need transfer hyperloglog in string to new data type and filter the normal string from RDB
Comment From: sundb
My concern is how we handle the aof rewrite after adding Hyperloglog datatype, we can no longer use SET, unless we add a command like pfload.
Comment From: azuredream
Hi @soloestoy , Please let me know if you are still interested in addressing this. Moving forward, there could be more and more data need to be protected from SETBIT and SETRANGE. It's a tricky issue, I can come up with some solutions but with side effects.
Solution1:
Setup a new flag bit_protected in robj. SETBIT and SETRANGE need to check this flag before editing.
Pros:
1. HLL is heavily relied on SDS operations. We don't need to rewrite those logic.
2. Data other than HLL can also take advantage of it.
Cons: 1. All the robjs will be larger. 2. Require changes in multiple commands.
Solution2: A straightforward way is adding a new OBJ_TYPE. SETBIT will fail. However, it would affect the whole type design.
robj *lookupStringForBitCommand(client *c, uint64_t maxbit, int *dirty) {
...
if (checkType(c,o,OBJ_STRING)) return NULL;
Solution3: Detecting magic string "HYLL" or another special code at beginning of string. That would break the guarantee that user can store anything in OBJ_STRING.