Describe the bug

rss_overhead_ratio and rss_overhead_bytes is too large

To reproduce

nothing

Expected behavior

after running Redis Server for a long time the rss_overhead_ratio is up to 2.5. I didn't use Redis Modules and used Lua scripts, but used_memory_lua was very small and trying to purge manually had no effect

Additional information

the redis-server version is 5.0.3 allocator is jemalloc-5.1.0 module list is empty the op system is Centos 8.5.2111(run in docker)

Comment From: sundb

@GXhua Can you give the output of INFO ALL?

Comment From: GXhua

yep @sundb

Server

redis_version:5.0.3 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:9529b692c0384fb7 redis_mode:standalone os:Linux 4.18.0-348.7.1.el8_5.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:8.4.1 process_id:176 run_id:e71e37102da5536baeb0f4e3223fd1ff20a6a1e0 tcp_port:6379 uptime_in_seconds:675381 uptime_in_days:7 hz:10 configured_hz:10 lru_clock:10695354 executable:/www/redis-server config_file:/etc/redis.conf

Clients

connected_clients:939 client_recent_max_input_buffer:2 client_recent_max_output_buffer:0 blocked_clients:37

Memory

used_memory:811458960 used_memory_human:773.87M used_memory_rss:2884251648 used_memory_rss_human:2.69G used_memory_peak:1852270704 used_memory_peak_human:1.73G used_memory_peak_perc:43.81% used_memory_overhead:347462540 used_memory_startup:790944 used_memory_dataset:463996420 used_memory_dataset_perc:57.24% allocator_allocated:811600184 allocator_active:1679548416 allocator_resident:1722388480 total_system_memory:15996276736 total_system_memory_human:14.90G used_memory_lua:46080 used_memory_lua_human:45.00K used_memory_scripts:3640 used_memory_scripts_human:3.55K number_of_cached_scripts:3 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:2.07 allocator_frag_bytes:867948232 allocator_rss_ratio:1.03 allocator_rss_bytes:42840064 rss_overhead_ratio:1.67 rss_overhead_bytes:1161863168 mem_fragmentation_ratio:3.55 mem_fragmentation_bytes:2072833544 mem_not_counted_for_evict:3058 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:16774602 mem_aof_buffer:3058 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0

Persistence

loading:0 rdb_changes_since_last_save:543086721 rdb_bgsave_in_progress:0 rdb_last_save_time:1654187141 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:1 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:15 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:207405056 aof_current_size:2178250808 aof_base_size:553110495 aof_pending_rewrite:0 aof_buffer_length:0 aof_rewrite_buffer_length:0 aof_pending_bio_fsync:0 aof_delayed_fsync:4

Stats

total_connections_received:2054 total_commands_processed:616121255 instantaneous_ops_per_sec:57 total_net_input_bytes:118779044540 total_net_output_bytes:92062458527 instantaneous_input_kbps:4.12 instantaneous_output_kbps:1.65 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:19493331 expired_stale_perc:12.86 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:210278315 keyspace_misses:96202156 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:39939 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0

Replication

role:master connected_slaves:0 master_replid:8d77aa658d80eac45342fb971da678336e6fcaa4 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:0 second_repl_offset:-1 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0

CPU

used_cpu_sys:5399.507151 used_cpu_user:5100.244009 used_cpu_sys_children:16.460214 used_cpu_user_children:263.096006

Cluster

cluster_enabled:0

Keyspace

db0:keys=3057421,expires=3057322,avg_ttl=166974030

Comment From: sundb

Look like it's fine through the info you give. Whether redis is in RDB saving when rss_overhead_ratio is 2.5(rdb_bgsave_in_progress is 1)?

Comment From: GXhua

No,i am not use RDB,I use AOF only,and the rss_overhead_ratio is growing always:

info memory

Memory

used_memory:649064240 used_memory_human:619.00M used_memory_rss:3669032960 used_memory_rss_human:3.42G used_memory_peak:1852270704 used_memory_peak_human:1.73G used_memory_peak_perc:35.04% used_memory_overhead:300430690 used_memory_startup:790944 used_memory_dataset:348633550 used_memory_dataset_perc:53.78% allocator_allocated:649175312 allocator_active:1451159552 allocator_resident:1493581824 total_system_memory:15996276736 total_system_memory_human:14.90G used_memory_lua:60416 used_memory_lua_human:59.00K used_memory_scripts:3640 used_memory_scripts_human:3.55K number_of_cached_scripts:3 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:2.24 allocator_frag_bytes:801984240 allocator_rss_ratio:1.03 allocator_rss_bytes:42422272 rss_overhead_ratio:2.46 rss_overhead_bytes:2175451136 mem_fragmentation_ratio:5.65 mem_fragmentation_bytes:3020091600 mem_not_counted_for_evict:1196 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:17593902 mem_aof_buffer:1196 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0

Comment From: GXhua

i must restart redis server every day

Comment From: sundb

@GXhua Please give the fully info when the issue occur.

Comment From: GXhua

okey @sundb

127.0.0.1:6379> info

Server

redis_version:5.0.3 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:9529b692c0384fb7 redis_mode:standalone os:Linux 4.18.0-348.7.1.el8_5.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:8.4.1 process_id:176 run_id:e71e37102da5536baeb0f4e3223fd1ff20a6a1e0 tcp_port:6379 uptime_in_seconds:738623 uptime_in_days:8 hz:10 configured_hz:10 lru_clock:10758596 executable:/www/redis-server config_file:/etc/redis.conf

Clients

connected_clients:941 client_recent_max_input_buffer:53431 client_recent_max_output_buffer:0 blocked_clients:38

Memory

used_memory:806553144 used_memory_human:769.19M used_memory_rss:3446640640 used_memory_rss_human:3.21G used_memory_peak:1852270704 used_memory_peak_human:1.73G used_memory_peak_perc:43.54% used_memory_overhead:346023529 used_memory_startup:790944 used_memory_dataset:460529615 used_memory_dataset_perc:57.15% allocator_allocated:808668232 allocator_active:1524396032 allocator_resident:1573720064 total_system_memory:15996276736 total_system_memory_human:14.90G used_memory_lua:55296 used_memory_lua_human:54.00K used_memory_scripts:3640 used_memory_scripts_human:3.55K number_of_cached_scripts:3 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.89 allocator_frag_bytes:715727800 allocator_rss_ratio:1.03 allocator_rss_bytes:49324032 rss_overhead_ratio:2.19 rss_overhead_bytes:1872920576 mem_fragmentation_ratio:4.27 mem_fragmentation_bytes:2638609872 mem_not_counted_for_evict:1152 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:25193273 mem_aof_buffer:1152 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0

Persistence

loading:0 rdb_changes_since_last_save:590381275 rdb_bgsave_in_progress:0 rdb_last_save_time:1654187141 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:1 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:16 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:233431040 aof_current_size:2760664518 aof_base_size:637881572 aof_pending_rewrite:0 aof_buffer_length:0 aof_rewrite_buffer_length:0 aof_pending_bio_fsync:0 aof_delayed_fsync:4

Stats

total_connections_received:2090 total_commands_processed:668659920 instantaneous_ops_per_sec:2605 total_net_input_bytes:129157120948 total_net_output_bytes:100134149391 instantaneous_input_kbps:564.46 instantaneous_output_kbps:506.61 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:21501976 expired_stale_perc:22.27 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:228764424 keyspace_misses:103548191 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:37762 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0

Replication

role:master connected_slaves:0 master_replid:8d77aa658d80eac45342fb971da678336e6fcaa4 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:0 second_repl_offset:-1 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0

CPU

used_cpu_sys:5896.909812 used_cpu_user:5561.648567 used_cpu_sys_children:18.171040 used_cpu_user_children:289.305433

Cluster

cluster_enabled:0

Keyspace

db0:keys=2903426,expires=2903323,avg_ttl=165625007

Comment From: sundb

@GXhua Can you give me which docker image you are using, I tried to reproduce this issue on my local.

Comment From: GXhua

docker pull 15811413647/messagebase:20220223 @sundb

Comment From: sundb

@GXhua I may have reproduced your problem, do you have THP turned off? You can confirm if redis starts with the following warning. You can check throught cat /sys/kernel/mm/transparent_hugepage/enabled.

# WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

Comment From: GXhua

yes! i checked the redis.log and find this:

176:M 03 Jun 2022 00:25:41.191 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

Comment From: GXhua

I echo never to /sys/kernel/mm/transparent_hugepage/enabled ,and watching the rss_overhead_ratio raising ,thanks very much! @sundb

Comment From: sundb

@GXhua Why does rss_overhead_ratio increase even after using never, it should avoid it. Because when THP is turned on, write operations may cause memory bursts in the fork processes.

Comment From: oranagra

Few things to note: 1. The THP check in version 5.0 was wrong, it used to warn when the setting was anything other than never, but in fact, the default of madvise is completely safe. so please just make sure it's not set to always on your system. 2. Fork and CoW doesn't increase the RSS or the parent process. what it causes is that the shared memory between two processes becomes private, so that the RSS of each of them remains the same, and the total system memory decreases. 3. looking at your metrics, it seems that the majority of the waste is due to fragmentation.

total:
 used_memory:          811,458,960
 used_memory_rss:    2,884,251,648
breakdown:
  allocator_allocated:  811,600,184
  allocator_active:   1,679,548,416
  allocator_resident: 1,722,388,480

total:
  mem_fragmentation_ratio: 3.55
breakdown:
  allocator_frag_ratio:    2.07  (defraggable via active-defrag)
  allocator_rss_ratio:     1.03  (could maybe be released with MEMORY PURGE)
  rss_overhead_ratio:      1.67  (other overhead in the process not directly tied to jemalloc)

note the last line above, i don't know where that comes from. maybe we can get further info by looking at /proc/<pid>/smaps

regarding fragmentation, it could easily be accumulated due to your workload pattern, switching between small allocations and large ones. we can check if enabling active-defrag help, and also, maybe we can get some info on where the fragmentation occurs by looking at MEMORY MALLOC-STATS

just one last note, the defrager code in 5.0 can cause latency spikes and freezes if you have huge list type keys.

Comment From: sundb

@oranagra But allocator_resident is the maximum amount of physical memory mapped by jemalloc, and I can't think of any case where used_memory_rss is 2.5 times more than allocator_resident. During my reproduction, after turning THP off, allocator_resident was basically stable at 1, but after I turned it on, allocator_resident appeared 2.x

Comment From: oranagra

i can think of some ways (which are unlikely, but still possible): * Lua (allocated from libc malloc, not from jemalloc) - in this case we can see that's not the issue/ * modules - not the issue in this case. * some other unexpected issues due to LD_PRELOAD

in any case, COW doesn't increase RSS, so that's not it. it could still be related to THP, but AFAIK only if it's set to always, so the warning redis prints at startup is not an indication.

Comment From: sundb

@oranagra Yes, I am having this issue after setting it to always.

Comment From: GXhua

The rss_overhead_ratio is stop raising , it works thank you very much @sundb @oranagra

Comment From: oranagra

ok, so the problem was fragmentation due to THP (not COW). right? was /sys/kernel/mm/transparent_hugepage/enabled set to always before?

Comment From: sundb

@oranagra Not due to COW. Here are a few scenarios from my local: 1. when I turn on THP redis process cost 24M of memory, but when I turn it off, it drops to 12M, this is reasonable, because the pages requested by jemalloc will be larger. 2. When I run redis-benchmark after THP is turned on, allocator_resident will drop from almost 24M to 16M, causing allocator_rss_ratio to become 2.x, but when redis-benchmark is finished, allocator_resident will go back to 24M, which is what I can't understand.

BTW, This may also be a docker issue, as I don't use docker to test against 5.0.3, and everything works fine.

Comment From: oranagra

i don't know exactly how THP affects things, but it doesn't concern me (at the moment). what bothers me if to know if it was with THP set to always or madvise and whether it was the default. i.e. if madvise (which i think is the default) causes that issue, we need to modify redis.

Comment From: sundb

@oranagra centos stream 8 uses always by default.

Comment From: oranagra

for the record, starting from redis 6.2, redis will attempt to automatically disable THP only for the current process, without depending or warning about the system global configs. see #7381