Hi~ I am wandering why HLL_P is 14 in HyperLogLog. I know that "The greater is P, the smaller the error". Buy why not use 13 or 15? Is there any advantage in implemention or something else?

Thanks!

Comment From: itamarhaber

Hi @yangzhen0512

This, I believe, was a conscious choice that reflects the tradeoff between space and accuracy, keeping in mind the expected capacity of the HLLs in Redis. Lowering P to 13 would save space, but the standard error would increase to ~1.15%. Increasing P to 15 would reduce the standard error to ~0.055%, but would double the memory footprint of the dense representing (from 12KB to about 24KB). As a compromise, 14 appears sane IMO :)

It is possible to conceive of an HLL implementation that accepts "P" as an argument and provisions the data structure accordingly. Redis' spirit has usually been against pre-provisioning data structures - creating them by demand instead - and therefore, HLL's parameters are hard-coded. As an example to the contrary, Redis Stack' Bloom Filter (and most other sketches in it) has the BF.RESERVE command that allows pre-provisioning a non-default capacity.

More background on Redis' HLL implementation may be gleaned from this inaugural post.

Comment From: yangzhen0512

Thanks for your answer.

Comment From: enjoy-binbin

Increasing P to 15 would reduce the standard error to ~0.055%

i think this should be 0.57%, right? @itamarhaber