I'm trying to implement a bitmap index using - among other things - the BITOP NOT operation but I'm getting unexpected results:
redis> FLUSHALL
OK
redis> SETBIT "a" 1 1
(integer) 0
redis> BITCOUNT "a"
(integer) 1
redis> BITOP NOT "b" "a"
(integer) 1
redis> BITCOUNT "b"
(integer) 7
I somehow expected the last command to return 1 thus taking into consideration the number of bits already set.
If this is by design, is there any way to circumvent this other than applying BITOP AND to the result? I would hate to have to do it because there will be literally millions of bits set for a single key.
Thank you so much for your help.
Comment From: mattsta
I think this falls under expected behavior.
All BITOP operations happen at byte-level increments, so even though you set 01, Redis is operating on value 01000000.
For bit (and normal string) operations, Redis stores the length of the underlying allocated string at the byte level. For this use case of setting only two bits, Redis has no way to store the exact bit length, so you get byte-sized return values. This also means Redis stores 01 and 010 and 0100 ... 01000000 as the same representation internally and can't distinguish between if you want 10 or 10111111 returned.
Comment From: bilus
Thanks for the info!
I so gathered by looking at the source. I was wondering.. when a bit is set, you need to update the size of the string is this right? Wouldn't it be possible to store the mask for the last byte along with the size? It's one byte wasted per key though so this may not be an option.
Any thoughts?
On Mon, May 19, 2014 at 7:50 PM, Matt Stancliff notifications@github.comwrote:
I think this falls under expected behavior.
All BITOP operations happen at byte-level increments, so even though you set 01, Redis is operating on value 01000000.
For bit (and normal string) operations, Redis stores the length of the underlying allocated string at the byte level. For this use case of setting only two bits, Redis has no way to store the exact bit length, so you get byte-sized return values. This also means Redis stores 01 and 010 and 0100... 01000000 as the same representation internally and can't distinguish between if you want 10 or 10111111 returned.
— Reply to this email directly or view it on GitHubhttps://github.com/antirez/redis/issues/1759#issuecomment-43534990 .
Comment From: yoav-steinberg
This make some sense. But adding extra data to each key is too much of an overhead for what's probably a very rare use case (there's always the option to handle this with a script/module). BTW all we really need are 3 extra bits to tell us the length of the last byte.
Another way to handle this is to make all bit operations use a new bit-array data type instead of piggybacking on string types or just use strings with extra data (like we do with hyperloglog commands or zsets with geo commands).
I'm closing this since it's old and probably no real requirement here.