Redis How to store an empty set? - Nineya|java/go/python

Lately, I discovered that Redis is by design not able to distinguish an empty set VS a missing set. I agree in some situation, it makes total sense to remove the entry when the set becomes empty. The problem is, when we need to distinguish both, then it starts to get hacky.

Here the situation, I have fallback mechanism to retrieve the data from a DB if the key is not found in Redis. In some situation, the's to data to attach to the key, but I was to store this information in Redis in order to avoid my service from falling back each time on the DB for no reason.

The two solutions I have in mind are pretty hacky: 1. Everytime I push something to the set, I also include an empty/dummy object, so this way, if I remove all the valid data from that key, Redis will still keep it, because the dummy object would remain. Then, everytime the app needs to read the data, then it adds an extra step to remove that dummy object.

SADD friends:antirez "Me" "dummy"
(integer) 2
SADD friends:antirez "You" "dummy"
(integer) 1
SREM friends:antirez "Me"
(integer) 1
SREM friends:antirez "You"
SMEMBERS friends:antirez
1) "dummy"

Every time I have to store an empty set, I simply store a simple boolean:

SET friends:antirez:empty "1"

So when the client trying to consume this cache would ask for that key:

SMEMBERS friends:antirez
(empty list or set)

The key will be missing, but then it will have to confirm if the key is missing or the set is just empty by calling again Redis:

GET friends:antirez:empty
"1"

This solution is less hacky, but it still requires two calls per key to confirm if a set is empty or not.

Why not introducing two new commands for the Set data type:

SADDMT friends:antirez
(integer) 0
SCARD friends:antirez
(integer) 1
SADDMT friends:antirez "new-feature
(integer) 1
SCARD friends:antirez
(integer) 1
SREMMT friends:antirez "new-feature
(integer) 1
SCARD friends:antirez
(integer) 0
SMEMBERS friends:antirez
<empty>

With these commands, it would be possible to store an <empty> set and if you remove the last object from a set, you could flag this set as <empty> instead of deleting the whole entry.

Comment From: itamarhaber

Hello @elenigen

I'm not sure that I understand the need but how about using a Hash instead of a Set? You can use a "special" field value, something random or even the empty string, to distinguish between real and empty things.

Comment From: elenigen

@itamarhaber Thanks for the reply, but I don't understand your suggestion, a set is a set and a hash is ... a hash, I mean, the whole point of having different data types is to use them depending on your needs. In my case, I need to store multiple unique values, so the problem here is, I want to be able to store an empty set (which in the current Redis design is impossible).

An empty set is totally different than having a missing entry, in a key-value structure:
A -> () is not the same as:
A -> null

... but in the Redis world, it's the same 😞 In some situations it make sense, but it's not always true. Here's an old article about the problem: https://www.bennadel.com/blog/2965-redis-doesn-t-store-empty-sets-or-hashes-and-will-delete-empty-sets-and-hashes.htm

In our case, the whole point of using Redis is to avoid querying our DB, since it's much slower. Just like it was described in one of the comments from the article I shared:

if(Key does not exist) { hit db; cache results; }

If the db results are empty, it hits the db every time for the objects with empty result sets.

Comment From: jaamison

Interesting problem, @elenigen , I can certainly see the motivation for a means to distinguish an empty set vs. missing/null.

I see a couple issues with your proposal to allow explicit creation of empty sets. If I'm understanding you right, you want to be able to put a set key into 1 of 3 distinct states: 1) existent with members, 2) existent but empty, 3) null/missing/not a set. You propose some new commands that allow you to explicitly define an empty (but existent) set that, when queried, will return a result different than that of a missing/undefined set.

The problem with adding support for explicit empty sets, is that Redis already supports empty sets: any nonexistent key is reported as an empty set when queried by a command in the set family (by design and for very good reason). In order to make your proposal work, the real feature that needs to be added is not empty sets, but keys with explicit null/undefined values. These would essentially be keys that, while "empty", are treated differently than other keys that are bona fide empty/missing. Adding this kind of construct to the type system would completely destroy the polymorphism and schema-less fluidity that we currently enjoy and rely on.

Also, maybe I misread your example, but a set with zero elements being reported as having a cardinality of 1 is fundamentally wrong.

Comment From: elenigen

@jaamison Thanks for your feedback, I think you understand the issue I have and my solution might not be perfect, but I don't really see other solutions. I try to find a design as backward compatible as it's possible, so I was just thinking we don't really need a new data structure for that, because it would probably be too much work and confusing for clients to understand the API. So instead of creating an "Emptyable" data structure, I just want to introduce a new family of operations where it would be possible to distinguish the two situations.

Actually, you are right it was a typo, but here some more details...

SMEMBERSMT friends:antirez
(nil)                                 <- indicating the key was not found
SCARDMT friends:antirez
(integer) -1                          <- could be (nil), but some redis client could break
SADDMT friends:antirez                <- adding an entry with an empty set and returning 0
(integer) 0
SCARD friends:antirez
(integer) 0                           <- the key exist, so the set has really a cardinality of 0
SADDMT friends:antirez "new-feature"
(integer) 1                           <- same behavior as the SADD command
SCARD friends:antirez
(integer) 1                           <- same behavior as the SCARD command
SREMMT friends:antirez "new-feature"  <- this wouldn't delete the entry as SREM, the size would be 0
(integer) 1
SCARD friends:antirez
(integer) 0                            <- confirming the entry is still present
SMEMBERSMT friends:antirez
(empty list or set)                    <- same as SMEMBERS (not sure why it says list?!)

I was thinking MT suffix is an easy way to remember the distinction, since it sounds like empty 💡😏

Of course the whole family of set-commands would have their counterpart: SPOPMT, SMOVEMT, ... In some way, the current implementation is not really consistent, since if the key doesn't exist SMEMBERS returns (empty list or set) while for other commands like SPOP or SRANDMEMBER the response is (nil). The MT-commands would fix that, since SPOPMT should probably return (nil) when the key doesn't exist, but (empty list or set) when it's empty.

Comment From: jaamison

What if you stored some other data type (like a string) at keys that have no record in your authoritative db, that way a query to a key that is known to be nonexistent returns a WRONGTYPE error, and a cache miss would return (empty list or set)?

Comment From: elenigen

It's a good idea, but it still sounds like a hack and it's weird if you think about it ... an invalid key return "empty" and an valid key return an error?! It should clearly be the opposite. I'm not sure how it would perform with the spring data client that I'm using to do pipelining on sets.

Thanks for the suggestion, but does my suggestion is still considered?

Comment From: nirname

I'd like this feature to be implemented too, right now i'm forced to use some invalid values (say, 0 as primary key in some table, whereas it starts with 1) just to keep key in place

Comment From: rcmonitor

+1 for this feature

Comment From: sujayvenaik

+1 for this feature. 🙏

@itamarhaber @jaamison Are there any plans to fix this somehow?

Comment From: mfrsousa

+1 for this I'm checking if a key exists to fetch data from db or not, if exists show results, if not go to the db retrieve the results and store them in the set , the problem is when it find no results on the db and I still want to save the set with empty so it will not go to db next time (until I delete the key)

Comment From: boriswexler

+1, same use case

Comment From: madolson

It seems like this is a big community ask, but I don't think we really have a solution yet that plays well with Redis. The solution put forward by @elenigen requires adding a lot of new commands, and will greatly complicate the schema. This problem is also generalizable to all the collection data types, so whatever solution exists, it should be generalizable.

If people are still following this, would love to understand the use case they are trying to solve with it.

Comment From: madolson

Related I guess, https://github.com/redis/redis/issues/7941

Comment From: sujayvenaik

@madolson I would be happy to expand on our use cases where we somehow circumvent this problem. Let me know if you want me to add to this thread or post/email it somewhere else.

Some discussion on this community thread even: https://forum.redislabs.com/t/how-to-solve-the-problem-of-storing-empty-set/789

Comment From: itamarhaber

I'm afraid that at this point it would be impossible to change the behavior without serious breakage to existing applications. The crux of the matter is that although the "empty set" exists by its own right (at least according to Set Theory), Redis' nested data types can't be "empty" (with the exclusion of Streams, naturally). This Redis principle applies to Strings (can't be nil), Hashes, Lists, Sorted Sets and Sets.

I'm aware of two approaches for working around that:

Using a custom null value, e.g. the empty string ("")
Maintaining a "does-it-exist" index using twin keys or an uber data structure

I find the first simplest and near-perfect, but I'm also known to err on occasion. Pinging @yossigo for more insights.

Comment From: madolson

@sujayvenaik Posting it here would be great.

I think Itamar is probably right though. I don't think there is really an option here to solve the use case. We could potentially include a mode that returns nil on missing keys, but that doesn't really seem like the right option to me. If this problem isn't solvable on our side, it would be nice to close this issue and give everyone closure.

Comment From: Javabien

The two workarounds are still clunky in my opinion, so for example the "" string for empty set, is not even the same type so it would break on the client side. We would need to catch the error and detect that it's an intentional error. The other solution with an extra index is very bad in terms of performance, since you need a second hit to Redis to resolve your response and you would have risk of data inconsistency + if you have a big number of empties, it would generate a big dataset to parse.

To let the clients to solve the issue, it means you will have so many people reinventing the wheel and some bugs could be introduced.

Comment From: yossigo

@itamarhaber I think your solutions are the handy ones.

Another solution is to create a custom data type module, which should be fairly simple. In terms of performance/memory efficiency it will not be 100% the same but very close.

The only way I see this supported in Redis core is either as a new data type and set of commands (ouch), or abandoning the idea that keys have no metadata so clients can choose the flavor of the set.

Comment From: ciurlaro42

+1 for this feature

Comment From: adlerfaulkner

+1 for this feature

Comment From: myifeng

+1 for this feature

Comment From: ckane-r7

+1 for this feature

Comment From: rene-muehlboeck

+1 for this feature

Comment From: Little-Elephant

+1 for this feature

Comment From: dorukkicikoglu

+1 for this feature

Comment From: madolson

For all future people here. Please don't +1 in the comments, it's harder to aggregate for prioritization. Please just 👍 the top level. If you would like to leave a comment, please include your use case and why you can't solve it with a different mechanism. @dorukkicikoglu pinging you since you just commented.

Comment From: ciurlaro42

@madolson thanks for giving this the attention it deserves. I think that both @elenigen and @nirname explained the issue pretty extensively, and it denotes a significant logical flaws that goes beyond the specific use cases.

I mean, is there really anything else needed to be added?

Quoting from previous comments:

I need to store multiple unique values, so the problem here is, I want to be able to store an empty set (which in the current Redis design is impossible).
An empty set is totally different than having a missing entry, in a key-value structure: A -> () is not the same as: A -> null ... but in the Redis world, it's the same 😞

And to solve this, people literally use hacky unintuitive solutions: 1. > The two solutions I have in mind are pretty hacky: [...]

I'm forced to use some invalid values (say, 0 as primary key in some table, whereas it starts with 1) just to keep key in place
I'm aware of two approaches for working around that: > - Using a custom null value, e.g. the empty string ("") > - Maintaining a "does-it-exist" index using twin keys or an uber data structure

@Javabien is right when he says:

To let the clients to solve the issue, it means you will have so many people reinventing the wheel and some bugs could be introduced.

Comment From: nmvk

Will take a look.

Comment From: duke-cliff

2023 already, and still no empty Set?

Comment From: ZeN220

+1 for this feature

Comment From: vinaykyellow

+1 for this feature

Comment From: madolson

@nmvk Any update?

Comment From: nmvk

@madolson Will get back by end of this week with more analysis.

Comment From: kosuke-zhang

+1 for this feature

Comment From: nmvk

Based on some discussion this is the direction I am currently thinking about to support the empty set use-case.

Write commands SADD key Existing SADD would be modified to make members as optional, invoking SADD with only key will create an empty set.

SPOP key [count] MT → Additional argument Add new optional argument to retain the SET if no element exist.

SREMMT key member [member ...] → [New Command] Same as SREM would keep the empty set once all members are removed.

SMOVEMT , SINTERSTOREMT, SDIFFSTOREMT, and SUNIONSTOREMT would be implemented as followups if needed. One cal delete the empty set using DEL command.

Read commands SCARD would still return 0 for empty set. To distinguish empty set from a non existing key one can use https://redis.io/commands/exists/ followed by SCARD. Same can be approach for the SISMEMBER and SMISMEMBER.

SMEMBERSMT key → [New Command] Similar to SMEMBERS but will return null when key does not exist. This command can be achieved by using EXISTS + SMEMBERS but would be supported for ease of use.

Comment From: nirname

@nmvk I feel like the suggested approach is not really consistent although it supports backward compatibility. The reason is that we are mixing arguments and commands

SPOP key count MT # arg
SREMMT key member # new command

I know, this is due to the fact that some commands take the list of members while others do not. I am a little concerned about these differences.

Comment From: madolson

@itamarhaber Will you take a look at the suggestion proposed by @nmvk ?

Comment From: kosuke-zhang

Any conclusions?