redis-5.0.12
slave
used_memory:11674502872 used_memory_human:10.87G used_memory_rss:11916976128 used_memory_rss_human:11.10G used_memory_peak:11674565992 used_memory_peak_human:10.87G used_memory_peak_perc:100.00% used_memory_overhead:42736840 used_memory_startup:1449864 used_memory_dataset:11631766032 used_memory_dataset_perc:99.65% allocator_allocated:11674516248 allocator_active:11675152384 allocator_resident:11921485824 total_system_memory:17179869184 total_system_memory_human:16.00G used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:12884901888 maxmemory_human:12.00G maxmemory_policy:volatile-lru allocator_frag_ratio:1.00 allocator_frag_bytes:636136 allocator_rss_ratio:1.02 allocator_rss_bytes:246333440 rss_overhead_ratio:1.00 rss_overhead_bytes:-4509696 mem_fragmentation_ratio:1.02 mem_fragmentation_bytes:242514280 mem_not_counted_for_evict:0 mem_replication_backlog:10485760 mem_clients_slaves:0 mem_clients_normal:66616 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0
master
used_memory:11062165000 used_memory_human:10.30G used_memory_rss:11444056064 used_memory_rss_human:10.66G used_memory_peak:11076346928 used_memory_peak_human:10.32G used_memory_peak_perc:99.87% used_memory_overhead:43858826 used_memory_startup:1449864 used_memory_dataset:11018306174 used_memory_dataset_perc:99.62% allocator_allocated:11062146648 allocator_active:11177414656 allocator_resident:11448696832 total_system_memory:17179869184 total_system_memory_human:16.00G used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:12884901888 maxmemory_human:12.00G maxmemory_policy:volatile-lru allocator_frag_ratio:1.01 allocator_frag_bytes:115268008 allocator_rss_ratio:1.02 allocator_rss_bytes:271282176 rss_overhead_ratio:1.00 rss_overhead_bytes:-4640768 mem_fragmentation_ratio:1.03 mem_fragmentation_bytes:382010944 mem_not_counted_for_evict:0 mem_replication_backlog:10485760 mem_clients_slaves:16922 mem_clients_normal:1171712 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0
@yossigo @oranagra @madolson
Sir, please help me
Comment From: sundb
@klin111 I would like to confirm the following points:
1. whether master and slave have been synchronized completely, you can confirm whether slave_repl_offset and master_repl_offset in INFO ALL are equal.
2. Do master and slave use the same config? Particular configurations likehash-max-listpack-entries, list-max-listpack-size, set-max-intset-entries, and similar.
3. Perhaps you can provide more info through INFO ALL.
Comment From: oranagra
few more things you can look into (in case it's not obvious from INFO output): 1. randomly check the OBJECT ENCODING and MEMORY USAGE of some keys to see if they're similar. 2. compare the differences in MEMORY MALLOC-STATS maybe it can teach us something in case we can't find any differences in any of the above. 3. upgrade, it could solve some bugs, but also note that MEMORY USAGE in that version isn't reporting the actual usage correctly for some types.
Comment From: klin111
@klin111 I would like to confirm the following points:
- whether master and slave have been synchronized completely, you can confirm whether
slave_repl_offsetandmaster_repl_offsetinINFO ALLare equal.- Do master and slave use the same config? Particular configurations like
hash-max-listpack-entries,list-max-listpack-size,set-max-intset-entries, and similar.- Perhaps you can provide more info through
INFO ALL. thank you @sundb
- Master and slave are fully synchronized. slave_repl_offset and master_repl_offset in INFO ALL are equal.
- special configuration ` hash-max-ziplist-entries 512 hash-max-ziplist-value 64 list-max-ziplist-entries 512 list-max-ziplist-value 64 set-max-intset-entries 512 stream-node-max-bytes 4096 stream-node-max-entries 100 zset-max-ziplist-entries 128 zset-max-ziplist-value 64
The master and slave configuration files are the same
3. info all
slave Server:
redis_version:5.0.12 redis_git_sha1:0 redis_git_dirty:0 redis_build_id:q23w4rq6e6accxxx redis_mode:cluster os:Linux 5.10.0-957.27.2.el7.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:4.8.5 process_id:306 run_id:90ui786t3cd40a1170aca41c414f1c2318dd2xxx tcp_port:6666 uptime_in_seconds:110709 uptime_in_days:1 hz:10 configured_hz:10 lru_clock:10738468 executable:/redis/redis-5.0.12/bin/redis-server config_file:/redis/conf/redis-cluster-9736.conf
Clients: connected_clients:2 client_recent_max_input_buffer:2 client_recent_max_output_buffer:0 blocked_clients:0
Memory: used_memory:11661178336 used_memory_human:10.86G used_memory_rss:11928752128 used_memory_rss_human:11.11G used_memory_peak:11677479408 used_memory_peak_human:10.88G used_memory_peak_perc:0.9986 used_memory_overhead:42751200 used_memory_startup:1449864 used_memory_dataset:11618427136 used_memory_dataset_perc:0.9965 allocator_allocated:11661193496 allocator_active:11685838848 allocator_resident:11937169408 total_system_memory:17179869184 total_system_memory_human:16.00G used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:12884901888 maxmemory_human:12.00G maxmemory_policy:volatile-lru allocator_frag_ratio:1 allocator_frag_bytes:24645352 allocator_rss_ratio:1.02 allocator_rss_bytes:251330560 rss_overhead_ratio:1 rss_overhead_bytes:-8417280 mem_fragmentation_ratio:1.02 mem_fragmentation_bytes:267616080 mem_not_counted_for_evict:0 mem_replication_backlog:10485760 mem_clients_slaves:0 mem_clients_normal:66616 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0
Persistence: loading:0 rdb_changes_since_last_save:15472063 rdb_bgsave_in_progress:0 rdb_last_save_time:1688349359 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0
Stats: total_connections_received:7456 total_commands_processed:15501705 instantaneous_ops_per_sec:0 total_net_input_bytes:6776127688 total_net_output_bytes:25372153 instantaneous_input_kbps:0 instantaneous_output_kbps:0.03 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0
Replication: role:slave master_host:192.168.1.6 master_port:6666 master_link_status:up master_last_io_seconds_ago:3 master_sync_in_progress:0 slave_repl_offset:505862536870 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:1q2w3a45qw1c667273f74017bd1b490f97b5dd64 master_replid2:0 master_repl_offset:505862536870 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:10000000 repl_backlog_first_byte_offset:505852536871 repl_backlog_histlen:10000000
CPU: used_cpu_sys:316.20177 used_cpu_user:446.584073 used_cpu_sys_children:0 used_cpu_user_children:0
Commandstats: cmdstat_slowlog:calls=7360,usec=98145,usec_per_call=13.33 cmdstat_ping:calls=11046,usec=4824,usec_per_call=0.44 cmdstat_config:calls=3700,usec=76317,usec_per_call=20.63 cmdstat_role:calls=1,usec=272,usec_per_call=272.00 cmdstat_select:calls=1,usec=4,usec_per_call=4.00 cmdstat_client:calls=2,usec=43,usec_per_call=21.50 cmdstat_cluster:calls=3700,usec=389207,usec_per_call=105.19 cmdstat_command:calls=2,usec=2737,usec_per_call=1368.50 cmdstat_dbsize:calls=1,usec=3,usec_per_call=3.00 cmdstat_memory:calls=4,usec=989,usec_per_call=247.25 cmdstat_hmset:calls=15472063,usec=90811447,usec_per_call=5.87 cmdstat_info:calls=3825,usec=416960,usec_per_call=109.01
Cluster: cluster_enabled:1
Keyspace: db0:keys=559999,expires=0,avg_ttl=0
master Server:
redis_version:5.0.12 redis_git_sha1:0 redis_git_dirty:0 redis_build_id:q23w4rq6e6accxxx redis_mode:cluster os:Linux 5.10.0-957.27.2.el7.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:4.8.5 process_id:2066 run_id:986t34e5eb58c2ab1fa8ab3daff13c5470b8bxxx tcp_port:6666 uptime_in_seconds:21253950 uptime_in_days:245 hz:10 configured_hz:10 lru_clock:10738780 executable:/redis/redis-5.0.12/bin/redis-server config_file:/redis/conf/redis-cluster-9736.conf
Clients: connected_clients:46 client_recent_max_input_buffer:2 client_recent_max_output_buffer:0 blocked_clients:0
Memory: used_memory:11061316392 used_memory_human:10.30G used_memory_rss:11452391424 used_memory_rss_human:10.67G used_memory_peak:11076346928 used_memory_peak_human:10.32G used_memory_peak_perc:0.9986 used_memory_overhead:43643746 used_memory_startup:1449864 used_memory_dataset:11017672646 used_memory_dataset_perc:0.9962 allocator_allocated:11061701512 allocator_active:11185254400 allocator_resident:11461353472 total_system_memory:17179869184 total_system_memory_human:16.00G used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:12884901888 maxmemory_human:12.00G maxmemory_policy:volatile-lru allocator_frag_ratio:1.01 allocator_frag_bytes:123552888 allocator_rss_ratio:1.02 allocator_rss_bytes:276099072 rss_overhead_ratio:1 rss_overhead_bytes:-8962048 mem_fragmentation_ratio:1.04 mem_fragmentation_bytes:390891656 mem_not_counted_for_evict:0 mem_replication_backlog:10485760 mem_clients_slaves:49694 mem_clients_normal:909500 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:
Persistence: loading:0 rdb_changes_since_last_save:15472514 rdb_bgsave_in_progress:0 rdb_last_save_time:1688349498 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:126 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:510795776 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0
Stats: total_connections_received:1448734 total_commands_processed:3916065741 instantaneous_ops_per_sec:37 total_net_input_bytes:1936505477657 total_net_output_bytes:1747515489466 instantaneous_input_kbps:83.5 instantaneous_output_kbps:132.47 rejected_connections:0 sync_full:2 sync_partial_ok:1 sync_partial_err:2 expired_keys:0 expired_stale_perc:0 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:307983688 keyspace_misses:1206819 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:326993 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0
Replication: role:master connected_slaves:1 slave0:ip=192.168.1.9,port=6666,state=online,offset=505862572596,lag=0 master_replid:1q2w3a45qw1c667273f74017bd1b490f97b5dd64 master_replid2:0 master_repl_offset:505862572674 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:10000000 repl_backlog_first_byte_offset:505852572675 repl_backlog_histlen:10000000
CPU: used_cpu_sys:148122.931638 used_cpu_user:142004.190394 used_cpu_sys_children:25.132197 used_cpu_user_children:96.867451
Commandstats: cmdstat_migrate:calls=24578,usec=53674786,usec_per_call=2183.85 cmdstat_slowlog:calls=1390648,usec=468395576,usec_per_call=336.82 cmdstat_dbsize:calls=1,usec=2,usec_per_call=2.00 cmdstat_hdel:calls=791,usec=6775,usec_per_call=8.57 cmdstat_config:calls=640094,usec=14817773,usec_per_call=23.15 cmdstat_info:calls=716934,usec=93677178,usec_per_call=130.66 cmdstat_command:calls=2,usec=6592,usec_per_call=3296.00 cmdstat_ping:calls=259,usec=171,usec_per_call=0.66 cmdstat_hmset:calls=3582434058,usec=21846202609,usec_per_call=6.10 cmdstat_scan:calls=132312,usec=2275971,usec_per_call=17.20 cmdstat_hmget:calls=305214177,usec=36356005414,usec_per_call=119.12 cmdstat_client:calls=256,usec=25635,usec_per_call=100.14 cmdstat_hget:calls=3730726,usec=48492378,usec_per_call=13.00 cmdstat_replconf:calls=21056259,usec=32084708,usec_per_call=1.52 cmdstat_auth:calls=9758,usec=9844,usec_per_call=1.01 cmdstat_psync:calls=3,usec=329172,usec_per_call=109724.00 cmdstat_cluster:calls=714885,usec=61717961,usec_per_call=86.33
Cluster: cluster_enabled:1
Keyspace: db0:keys=559999,expires=0,avg_ttl=0
`
Comment From: klin111
few more things you can look into (in case it's not obvious from INFO output):
- randomly check the OBJECT ENCODING and MEMORY USAGE of some keys to see if they're similar.
- compare the differences in MEMORY MALLOC-STATS maybe it can teach us something in case we can't find any differences in any of the above.
- upgrade, it could solve some bugs, but also note that MEMORY USAGE in that version isn't reporting the actual usage correctly for some types.
thank you @oranagra
- Randomly test that the underlying type and size of several keys are consistent
- MEMORY MALLOC-STATS gives too much data, I don’t know how to analyze it .What data should you focus on?
- Will not upgrade for the time being, what version is recommended to upgrade to?
Comment From: oranagra
looking at the info you provided, i don't see such a big difference (master uses 10.30GB and slave uses 10.86GB).
as expected there's different number of clients connected to each, which results in bigger used_memory_overhead on the master (by some 900KB).
considering the difference is relatively small, i doubt we'll be able to spot anything in MALLOC-STATS. I'd suggest to upgrade to to the latest in 7.0, there are a ton of bugs fixes, and optimizations that were applied since 5.0.
Comment From: klin111
@oranagra
There is a difference of 572MB between the used_memory of the master instance and the slave instance. What causes the difference?
MALLOC-STATS has too much content, can only provide important points?
Comment From: oranagra
yes, i saw all that, and i commented that it's not a huge difference (500mb out of 10gb).. in buggy scenarios, i've seen much more (like 200%).
in any case, i don't know how to find out the cause for this, this old version doesn't have any other information, and also it is somewhat likely that the problem was already solved anyway. all i can do is suggest an upgrade.
Comment From: klin111
@oranagra
In which version is this bug fixed?
Which minor version of version 7 can be used in production
Comment From: sundb
@klin111 It's hard to know from the information available that it could be due to some bug, do you still see any differences after reboot? It is recommended to upgrade to 6.2.12, 7.0.11 is also a choice.
Comment From: oranagra
the above is inaccurate or even incorrect.
traditionally, it is the master is the one keeping the backlog and other replication overheads, which we can see in used_memory_overhead.
but also, since PSYNC2 (redis 4.0), the slave keeps that backlog too (but not the salve buffers).
the argument about rehashing is valid, but at least in this case, not for the main dict, which, it's overhead is also included in used_memory_overhead, which is similar in the master and slave. maybe it's a big dict inside some key (hash, set, or zset). but that should be visible with MEMORY USAGE
Comment From: 631086083
the above is inaccurate or even incorrect. traditionally, it is the master is the one keeping the backlog and other replication overheads, which we can see in
used_memory_overhead. but also, since PSYNC2 (redis 4.0), the slave keeps that backlog too (but not the salve buffers).the argument about rehashing is valid, but at least in this case, not for the main dict, which, it's overhead is also included in used_memory_overhead, which is similar in the master and slave. maybe it's a big dict inside some key (hash, set, or zset). but that should be visible with MEMORY USAGE
Sorry for the wrong explanation. Judging from the current news, most of the keys in this Redis cluster are hash structures, and it is very likely that the underlying hash table of some keys is being resharded, because rehash will only be performed when they are accessed, so there will be some more memory usage. What about adding a special task on the slave to rehash the key?
Comment From: oranagra
What about adding a special task on the slave to rehash the key?
that's possible. not sure how common it is for a key to grow crossing the rehash limit and then become completely read-only.
let's start by trying to prove this theory.
we can use DEBUG HTSTATS-KEY <key> to try to compare the dict HT size of keys on the master and slave.
just be careful not to run it on hashes that are really big (it could hang for a while).