Redis memory leak on redis-server - Nineya|java/go/python

redis cluster based on master-slave mode, using default configuration. the issue is that used_memory_human is much bigger than memkeys. used_memory_human is 3.95G，but memkeys is about 1.1G. i start a new redis instance with loading the dump.rdb of this redis instance, the used_memory_human is about 1.1G

/data $ redis-cli info memory

used_memory:4246101096 used_memory_human:3.95G used_memory_rss:4105056256 used_memory_rss_human:3.82G used_memory_peak:4382009240 used_memory_peak_human:4.08G used_memory_peak_perc:96.90% used_memory_overhead:14911288 used_memory_startup:792376 used_memory_dataset:4231189808 used_memory_dataset_perc:99.67% allocator_allocated:4246205976 allocator_active:4253106176 allocator_resident:4279185408 total_system_memory:67387084800 total_system_memory_human:62.76G used_memory_lua:40960 used_memory_lua_human:40.00K used_memory_scripts:216 used_memory_scripts_human:216B number_of_cached_scripts:1 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.00 allocator_frag_bytes:6900200 allocator_rss_ratio:1.01 allocator_rss_bytes:26079232 rss_overhead_ratio:0.96 rss_overhead_bytes:-174129152 mem_fragmentation_ratio:0.97 mem_fragmentation_bytes:-141023832 mem_not_counted_for_evict:0 mem_replication_backlog:1048576 mem_clients_slaves:0 mem_clients_normal:66616 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0

/data $ redis-cli --memkeys

-------- summary -------

Sampled 184705 keys in the keyspace! Total key length in bytes is 7602320 (avg len 41.16)

Biggest hash found 'UCS_ATTRIBUTE_CACHE_KEY_10001' has 212830867 bytes Biggest string found 'UcAuthorization::10016' has 20635 bytes

0 lists with 0 bytes (00.00% of keys, avg size 0.00) 151 hashs with 1045001209 bytes (00.08% of keys, avg size 6920537.81) 184554 strings with 78384857 bytes (99.92% of keys, avg size 424.73) 0 streams with 0 bytes (00.00% of keys, avg size 0.00) 0 sets with 0 bytes (00.00% of keys, avg size 0.00) 0 zsets with 0 bytes (00.00% of keys, avg size 0.00)

Comment From: oranagra

@foxyriver which version of redis are you using? do both the master and replica show the same usage?

Comment From: foxyriver

@oranagra the version of redis is 5.0.7. yes, both of the master and replica show the same usage.

Comment From: oranagra

@foxyriver i don't yet have any clue as to what this could be. you said that restarting redis from the rdb file doesn't re-create unexplained memory, and that the replica suffers from it too, so indeed this seems like a leak. can you tell if after such a restart from rdb file, the leak comes back after a while? maybe you can post your INFO COMMANDSTATS?

Comment From: VonAlex

@oranagra I also encountered this problem. A good cluster with 1400 nodes(700 masters - 700 replicas), shutdown redis one by one, and then start one by one,u will find the abnormal mem usage info in INFO MEMORY.

$ redis-cli INFO MEMORY | grep used_memory used_memory:2228397400 used_memory_human:2.08G used_memory_rss:1769455616 used_memory_rss_human:1.65G used_memory_peak:2252016608 used_memory_peak_human:2.10G used_memory_peak_perc:98.95% used_memory_overhead:9204032 used_memory_startup:8088840 used_memory_dataset:2219193368 used_memory_dataset_perc:99.95% used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B

$ redis-cli -h INFO COMMANDSTATS # Commandstats cmdstat_psync:calls=1,usec=112372,usec_per_call=112372.00 cmdstat_replconf:calls=4476,usec=3952,usec_per_call=0.88 cmdstat_dbsize:calls=1,usec=1,usec_per_call=1.00 cmdstat_slowlog:calls=452,usec=2314,usec_per_call=5.12 cmdstat_cluster:calls=589,usec=276967,usec_per_call=470.23 cmdstat_scan:calls=1,usec=21,usec_per_call=21.00 cmdstat_config:calls=2788,usec=27082,usec_per_call=9.71 cmdstat_ping:calls=716,usec=270,usec_per_call=0.38 cmdstat_info:calls=1629,usec=66415,usec_per_call=40.77

$ redis-cli --memkeys # Scanning the entire keyspace to find biggest keys as well as # average sizes per key type. You can use -i 0.1 to sleep 0.1 sec # per 100 SCAN commands (not usually needed).

-------- summary -------

Sampled 0 keys in the keyspace! Total key length in bytes is 0 (avg len 0.00)>

0 hashs with 0 bytes (00.00% of keys, avg size 0.00) 0 lists with 0 bytes (00.00% of keys, avg size 0.00) 0 strings with 0 bytes (00.00% of keys, avg size 0.00) 0 streams with 0 bytes (00.00% of keys, avg size 0.00) 0 sets with 0 bytes (00.00% of keys, avg size 0.00) 0 zsets with 0 bytes (00.00% of keys, avg size 0.00)

Comment From: oranagra

@VonAlex so you have a node that's completely empty (no keys), and yet it consumes about 2GB of memory? which version are you using? i suppose this is somehow related to the high number of nodes in the cluster. @madolson please take a look.

Comment From: VonAlex

@oranagra yes, it is an empty cluster without wirte ops. Here is the version info:

$ ./redis-server --version Redis server v=5.0.8 sha=xxxxxx malloc=jemalloc-5.1.0 bits=64 build=xxxxx

Comment From: VonAlex

Not all nodes consume about 2GB of memory, some only 120M+.

Comment From: madolson

Is there anything that connects the nodes with 2GB of memory? Do the primaries have 2GB while the replicas only have 120MB for example?

Redis does consume memory proportional to the size of the cluster, but I would expect it to be on the order of 10MBs max, not GBs. I did a quick glance through the clusterNode structure and didn't see anything that would grow substantially. I know the cluster connections aren't tracked in any type of overhead, and they can silently consume a lot of memory.

Comment From: VonAlex

@madolson It is a truely empty cluster, no external connetion. The cluster has 700 primary node，130 of them consumed memory in GBs in my test. Primary node and replica will not be abnormal at the same time in one group.

Some statistics like this : (1st column is "primary nodename", 2nd column is "primary node used memor", 3rd column is "replica used memory ")

e559d1014607c05d3ba48a618c20c7822ca7561c used_memory_human:1.61G used_memory_human:128.28M e7d5294c9670e04bac67e2c9034cf699147fca9a used_memory_human:1.52G used_memory_human:128.29M ea6c10077324886060b29d7343bf5bcc6ae27b3d used_memory_human:1.04G used_memory_human:126.35M eb1da780556f687dbd1b6c845bab0c624a4c2748 used_memory_human:1.90G used_memory_human:128.94M

130af011da6a974735abb196983eb80de149c096 used_memory_human:127.31M used_memory_human:1.53G 868b136ef46d8b17f83a2b936f1fe6d5c740f627 used_memory_human:127.31M used_memory_human:1.76G f475bdc6e00f902ccbbf9627c08d14afa77b8f61 used_memory_human:129.04M used_memory_human:1.75G

Comment From: oranagra

@VonAlex please post a copy of MEMORY malloc-stats from one of the big nodes, maybe we'll be able to figure out what's eating the memory if we know the size of allocations.

Comment From: VonAlex

@oranagra This is a node consumed 2.03G memory, output of MEMORY malloc-stats in the following file: stats.txt

Comment From: oranagra

i see the majority of the memory is used for really big allocations (over 10kb). can you please post CLIENT LIST?

besides that, i would suggest to try to upgrade to a recent version of redis (6.2), maybe this problem is solved there. also it contains some fixes and improvements to the memory reporting code (tracking memory used for client argv array and such)

Comment From: VonAlex

The CLIENT LIST looks good, as follows:

$ redis-cli CLIENT LIST id=3 addr=xxxxxxx fd=16 name= age=83963 idle=1 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 >obl=0 oll=0 omem=0 events=r cmd=replconf id=16803 addr=xxxxx fd=2469 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=26 qbuf-> free=32742 obl=0 oll=0 omem=0 events=r cmd=client

And I will try redis 6.2 later today.

Comment From: VonAlex

@oranagra I test it with redis 6.2, still have the problem.

$ ./redis-server -v Redis server v=6.2.0 sha=445aa844:1 malloc=jemalloc-5.1.0 bits=64 build=eee64f88c9ead0f1

redis-cli INFO MEMORY # Memory used_memory:2067044424 used_memory_human:1.93G used_memory_rss:1460105216 used_memory_rss_human:1.36G used_memory_peak:2071810784 used_memory_peak_human:1.93G used_memory_peak_perc:99.77% used_memory_overhead:9175488 used_memory_startup:8106392 used_memory_dataset:2057868936 used_memory_dataset_perc:99.95% allocator_allocated:2067176104 allocator_active:2079092736 allocator_resident:2095464448 total_system_memory:540421885952 total_system_memory_human:503.31G used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:6000000000 maxmemory_human:5.59G maxmemory_policy:volatile-lru allocator_frag_ratio:1.01 allocator_frag_bytes:11916632 allocator_rss_ratio:1.01 allocator_rss_bytes:16371712 rss_overhead_ratio:0.70 rss_overhead_bytes:-635359232 mem_fragmentation_ratio:0.71 mem_fragmentation_bytes:-606877216 mem_not_counted_for_evict:4 mem_replication_backlog:1048576 mem_clients_slaves:20512 mem_clients_normal:0 mem_aof_buffer:8 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:0

$ redis-cli client list id=3 addr=xxxx laddr=xxx fd=1349 name= age=409 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20512 events=r cmd=replconf user=default redir=-1 id=89 addr=127.0.0.1:60516 laddr=127.0.0.1:5081 fd=2337 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=26 qbuf-free=40928 argv-mem=10 obl=0 oll=0 omem=0 tot-mem=61466 events=r cmd=client user=default redir=-1

and the MEMORY malloc-stats, stats.txt

Comment From: oranagra

i'm clueless. maybe you can teach us how to reproduce this?

Comment From: VonAlex

I make it like this： 1) use redis-cli --cluster create [ip:prot].. to create a cluster with 1400 nodes. 2) use redis-cli --cluster check to make sure the cluster is good. 3) shutdown all of them. 4) startup those 1400 nodes one by one.

Following above steps, will find some nodes with abnormal memory usage.

Comment From: VonAlex

@oranagra Hi... have you made any progress about the problem? I tried, but find nothing.

Comment From: oranagra

sorry. i'm currently too busy with other things. i suppose it would help if you create a script for the above reproduction scenario, so it would be easier to reproduce and debug.

Comment From: aradz44

@VonAlex Hi, can you please add more details on how to reproduce this scenario ? Maybe attach the script. I'm trying to reproduce and see the issue with no success..

Comment From: aradz44

Tried to reproduce this bug for almost a week... could not create a 1400 node cluster that works fine, my computer couldn't handle the load.. Did succeed to do it with up to 800 nodes, but didn't find the memory leak, I was trying with version 6.2.5 and with the last commit of the unstable branch.

Comment From: VonAlex

@VonAlex Hi, can you please add more details on how to reproduce this scenario ? Maybe attach the script. I'm trying to reproduce and see the issue with no success..

sorry, haven't receive the msg, the problem was solved in https://github.com/redis/redis/pull/9255

Comment From: oranagra

great, thanks for letting us know.