The problem/use-case that the feature addresses

Add new data structure and commands to support approximate set membership tests. There are various literature like Bloom Filter, Cuckoo Filter to support the same. These are pretty popular data structure and are widely used for use cases like detecting weak passwords, malicious URLs, etc.

https://dl.acm.org/doi/10.1145/362686.362692 https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

Description of the feature

  • The filter should start with a default configuration (capacity, number of hashes, etc) to keep the API simple and no pre-creation of filter should be required ideally.
  • The filter should be able to adapt dynamically to the number of items stored (https://www.sciencedirect.com/science/article/abs/pii/S0020019006003127)

New set of commands to support adding items, checking the membership, total count of items added.

  • Add Items
ASADD key item [item ...]
  • Check item exists
ASCHECK key item
  • Number of unique items added
ASCOUNT key
  • Remove items from a set (if we choose to use cuckoo filter)
ASREM key item [item ...]

Alternatives you've considered

Lua Scripts: https://github.com/erikdubbelboer/redis-lua-scaling-bloom-filter

Additional information

  • https://github.com/redis/redis/issues/1006
  • http://antirez.com/news/89

Comment From: hpatro

I think this will be a good addition to the core Redis and have a standardised implementation instead of various Lua scripts, custom implementation and different third party modules.

@redis/core-team What are your thoughts on this ?

Comment From: thallredis

We encourage you to review what already exists here: https://github.com/RedisBloom/RedisBloom

Comment From: madolson

@thallredis I see that module is not currently open source and not part of the main Redis project, do you have any thoughts about the possibility of pulling it in?

Comment From: thallredis

We aren't planning on pulling it in -- due to the license differences. But, clearly the investment has already been made there and any enhancements should be considered there first. The RSAL/SSPL license is clear in terms of its availability and intent for developers usage and adoption. It is also clear in terms of what it prevents for others.

Comment From: madolson

But, clearly the investment has already been made there and any enhancements should be considered there first.

Given the the redis bloom project is not open source, it still seems reasonable to consider this for the main Redis project though.

Comment From: hpatro

@thallredis Thanks for the response.

I've the same thought as @madolson here. RedisBloom isn't FOSS and my intent of filing this request was to build this feature in the core Redis project.

Comment From: thallredis

@madolson I appreciate your engagement and ongoing contributions as a member of the Redis core team. Your technical expertise and suggestions are always valued.

I would like to address the particular recommendation that you and @hpatro have made in this instance. As it stands, the feature in question has largely been made available (including the underlying source code) for developers, albeit under a license that may not align with the preferences of your employer.

We’re committed to the growth and improvement of the Redis core project and the extended modules that have been created, including the Bloom module. If there are specific enhancements that would benefit that module and the many developers who take advantage of it, we are more than happy to review and consider them. We will review this particular item in more detail, but as part of a proposed enhancement to the code that already exists and functions.

Our primary goal is to serve the open-source community and maintain the integrity of the Redis project. Thank you for your understanding and continued support.

Comment From: zuiderkwast

If we add a probabilistic filter implementation, as a bonus we can use it to implement #122.

Comment From: madolson

If we add a probabilistic filter implementation, as a bonus we can use it to implement https://github.com/redis/redis/issues/122.

We could just do that anyways. Bloom filters aren't that hard to write :). I will say I think we should close that though, I think sharded pubsub is the better solution.