As of now, --bigkeys scans the entire dataset and reports the largest keys (number of elements for nested data types) for each data type. However, it would be great if we could also get the keys which are consuming lot of space i.e. the value part is large. I am sure, most of the suggestions/answers here would be to use dataset analyzer/profiler tools to do exactly this. But as we know, almost all the tools out there, do not provide exact memory usage due to serialization and also many Redis DBaaS providers block the commands which can be used to perform this operation. Next option provided might be applicable for redis 4.0 and above, which is, to use the MEMORY USAGE command, but one needs to script it so that we scan the entire dataset and have the usage per key.
Since, --bigkeys is already implementing this kind of thing (scan and report), it would be nice if a user can simply use redis-cli to do it. Maybe would also be a good idea to make it backwards compatible. Please share your views, suggestions, criticism on this matter and upvote if you also wish that this should be a part of the product/project.
Comment From: itamarhaber
Hello @VikramMoule
I was actually playing with "improving" the --bigkeys and thinking of making it use MEMORY USAGE (if version >= 4) to report the top-n keys or sumthing - is that what you're suggesting?
Comment From: oranagra
my 2 cents: i think it should be optional, there's still value in finding keys with lots of elements (commands' complexity thingy).
Comment From: itamarhaber
@oranagra - I got stuck finding a good name for the switch (didn't plan on deprecating the current) ;)
Comment From: VikramMoule
@itamarhaber Hi, That is on part that I am expecting. If something similar could be done for redis < 4.0 then that would be a great addition as well. But we can start with with redis > 4.0
@oranagra Indeed. I didn't intend to knock out current behavior
Comment From: itamarhaber
@VikramMoule
Please help by adding your expectations about the behavior.
P.S. now I remembered why I stopped working on this - first we need to support custom data types in the CLI (#5175) and also perhaps let modules accept the sample count (#4177).
Comment From: itamarhaber
/cc @artix75
Comment From: VikramMoule
@itamarhaber Apologies for the delay in getting back to you. I think the expected behavior can be similar to the existing data profiling tools out there. So if someone uses redis-cli --bigmem
$ redis-cli --bigmem profiled_data.csv
# Scanning the entire keyspace to profile the data.
# You can use -i 0.1 to sleep 0.1 sec per 100 SCAN commands (not usually needed).
-------- summary -------
Sampled 3000 keys in the keyspace and results stored in profiled_data.csv
The CSV can simply have key name and the size in bytes. The user can then simply sort the data as needed. Hope this is what you expected.
Comment From: itamarhaber
This can be closed as resolved via #5856
Comment From: houstonheat
@itamarhaber May I ask you to check from which version redis-cli (or redis-server) supports the --bigkeys for streams? Couldn't found this info by the merge commit, sorry :(
Comment From: itamarhaber
@houstonheat no worries - this was introduced to redis-cli via https://github.com/redis/redis/commit/a8921c062dcd0a0faeb0da83b335f49663502853 that's a part of Redis 6