Redis add/change --bigkeys behaviour to report used memory of largest keys in a DB

As of now, --bigkeys scans the entire dataset and reports the largest keys (number of elements for nested data types) for each data type. However, it would be great if we could also get the keys which are consuming lot of space i.e. the value part is large. I am sure, most of the suggestions/answers here would be to use dataset analyzer/profiler tools to do exactly this. But as we know, almost all the tools out there, do not provide exact memory usage due to serialization and also many Redis DBaaS providers block the commands which can be used to perform this operation. Next option provided might be applicable for redis 4.0 and above, which is, to use the MEMORY USAGE command, but one needs to script it so that we scan the entire dataset and have the usage per key.

Since, --bigkeys is already implementing this kind of thing (scan and report), it would be nice if a user can simply use redis-cli to do it. Maybe would also be a good idea to make it backwards compatible. Please share your views, suggestions, criticism on this matter and upvote if you also wish that this should be a part of the product/project.

Comment From: itamarhaber

Hello @VikramMoule

I was actually playing with "improving" the --bigkeys and thinking of making it use MEMORY USAGE (if version >= 4) to report the top-n keys or sumthing - is that what you're suggesting?

Comment From: oranagra

my 2 cents: i think it should be optional, there's still value in finding keys with lots of elements (commands' complexity thingy).

Comment From: itamarhaber

@oranagra - I got stuck finding a good name for the switch (didn't plan on deprecating the current) ;)

Comment From: VikramMoule

@itamarhaber Hi, That is on part that I am expecting. If something similar could be done for redis < 4.0 then that would be a great addition as well. But we can start with with redis > 4.0

@oranagra Indeed. I didn't intend to knock out current behavior

Comment From: itamarhaber

@VikramMoule

Please help by adding your expectations about the behavior.

P.S. now I remembered why I stopped working on this - first we need to support custom data types in the CLI (#5175) and also perhaps let modules accept the sample count (#4177).

Comment From: itamarhaber

/cc @artix75

Comment From: VikramMoule

@itamarhaber Apologies for the delay in getting back to you. I think the expected behavior can be similar to the existing data profiling tools out there. So if someone uses redis-cli --bigmem (bigmem is actually suggested by Itamar) , the expectation is that we SCAN the entire dataset, and report the memory size for each key in the CSV file. Something like,


$ redis-cli --bigmem profiled_data.csv
# Scanning the entire keyspace to profile the data. 
# You can use -i 0.1 to sleep 0.1 sec per 100 SCAN commands (not usually needed).

-------- summary -------

Sampled 3000 keys in the keyspace and results stored in profiled_data.csv

The CSV can simply have key name and the size in bytes. The user can then simply sort the data as needed. Hope this is what you expected.

Comment From: itamarhaber

This can be closed as resolved via #5856

Comment From: houstonheat

@itamarhaber May I ask you to check from which version redis-cli (or redis-server) supports the --bigkeys for streams? Couldn't found this info by the merge commit, sorry :(

Comment From: itamarhaber

@houstonheat no worries - this was introduced to redis-cli via https://github.com/redis/redis/commit/a8921c062dcd0a0faeb0da83b335f49663502853 that's a part of Redis 6