Crash report

We are switching from Redis 6.0.15 on Debian 10(Buster) to Redis 7.0.11 on Debian 12(Bookworm)

On previous deployment there were no issues with Redis, but on new deployment it crashes on OOM errors even though the server configuration was identical.

7151:M 29 Nov 2023 23:55:04.068 # Out Of Memory allocating 574036 bytes!


=== REDIS BUG REPORT START: Cut & paste starting from here ===
7151:M 29 Nov 2023 23:55:04.068 # ------------------------------------------------
7151:M 29 Nov 2023 23:55:04.068 # !!! Software Failure. Press left mouse button to continue
7151:M 29 Nov 2023 23:55:04.068 # Guru Meditation: Redis aborting for OUT OF MEMORY. Allocating 574036 bytes! #server.c:6655

------ STACK TRACE ------

Backtrace:
/usr/bin/redis-server 0.0.0.0:6379(redisOutOfMemoryHandler+0x2a)[0x55715a6ab3fa]
/usr/bin/redis-server 0.0.0.0:6379(zmalloc_usable+0x6b)[0x55715a6bcbab]
/usr/bin/redis-server 0.0.0.0:6379(_addReplyProtoToList+0xa8)[0x55715a6c7b68]
/usr/bin/redis-server 0.0.0.0:6379(addReplyBulk+0x3b)[0x55715a6ca4ab]
/usr/bin/redis-server 0.0.0.0:6379(getGenericCommand+0x4a)[0x55715a6eb8ca]
/usr/bin/redis-server 0.0.0.0:6379(call+0xdb)[0x55715a6af65b]
/usr/bin/redis-server 0.0.0.0:6379(processCommand+0x98d)[0x55715a6b0c2d]
/usr/bin/redis-server 0.0.0.0:6379(processInputBuffer+0xe6)[0x55715a6c9c16]
/usr/bin/redis-server 0.0.0.0:6379(readQueryFromClient+0x2e8)[0x55715a6cda28]
/usr/bin/redis-server 0.0.0.0:6379(+0x13bb84)[0x55715a77cb84]
/usr/bin/redis-server 0.0.0.0:6379(+0x64ed8)[0x55715a6a5ed8]
/usr/bin/redis-server 0.0.0.0:6379(aeMain+0x1d)[0x55715a6a69cd]
/usr/bin/redis-server 0.0.0.0:6379(main+0x316)[0x55715a6a21f6]
/lib/x86_64-linux-gnu/libc.so.6(+0x271ca)[0x7f766d8461ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f766d846285]
/usr/bin/redis-server 0.0.0.0:6379(_start+0x21)[0x55715a6a2911]

------ INFO OUTPUT ------
# Server
redis_version:7.0.11
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:c4e7f6bf175a885b
redis_mode:standalone
os:Linux 6.1.0-10-amd64 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:12.2.0
process_id:7151
process_supervised:systemd
run_id:35be2626b03a76440be1847155aad032268bd2e3
tcp_port:6379
server_time_usec:1701298504068720
uptime_in_seconds:1431057
uptime_in_days:16
hz:10
configured_hz:10
lru_clock:6799688
executable:/usr/bin/redis-server
config_file:/etc/redis/redis.conf
io_threads_active:0

# Clients
connected_clients:600
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:262144
client_recent_max_output_buffer:262168
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:4979538144
used_memory_human:4.64G
used_memory_rss:5996359680
used_memory_rss_human:5.58G
used_memory_peak:4986502056
used_memory_peak_human:4.64G
used_memory_peak_perc:99.86%
used_memory_overhead:354421960
used_memory_startup:876112
used_memory_dataset:4625116184
used_memory_dataset_perc:92.90%
allocator_allocated:4979424552
allocator_active:5932032000
allocator_resident:6010408960
total_system_memory:33656152064
total_system_memory_human:31.34G
used_memory_lua:33792
used_memory_vm_eval:33792
used_memory_lua_human:33.00K
used_memory_scripts_eval:632
number_of_cached_scripts:3
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:66560
used_memory_vm_total_human:65.00K
used_memory_functions:200
used_memory_scripts:832
used_memory_scripts_human:832B
maxmemory:24000000000
maxmemory_human:22.35G
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.19
allocator_frag_bytes:952607448
allocator_rss_ratio:1.01
allocator_rss_bytes:78376960
rss_overhead_ratio:1.00
rss_overhead_bytes:-14049280
mem_fragmentation_ratio:1.20
mem_fragmentation_bytes:1017480792
mem_not_counted_for_evict:0
mem_replication_backlog:256002696
mem_total_replication_buffers:256690008
mem_clients_slaves:690008
mem_clients_normal:9761912
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:302739226
rdb_bgsave_in_progress:0
rdb_last_save_time:1699867447
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:9
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:220016640
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:27180229
total_commands_processed:596587748
instantaneous_ops_per_sec:6058
total_net_input_bytes:101215260900
total_net_output_bytes:519257332480
total_net_repl_input_bytes:0
total_net_repl_output_bytes:89559888291
instantaneous_input_kbps:751.46
instantaneous_output_kbps:4133.02
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:650.12
rejected_connections:0
sync_full:1
sync_partial_ok:1
sync_partial_err:1
expired_keys:24387269
expired_stale_perc:8.46
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:73852
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:143287823
keyspace_misses:15679239
pubsub_channels:1
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:348508
total_forks:1
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:2
dump_payload_sanitizations:0
total_reads_processed:515772612
total_writes_processed:767784112
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:2626424
reply_buffer_expands:640859

# Replication
role:master
connected_slaves:1
slave0:ip=192.168.4.111,port=6379,state=online,offset=121637812764,lag=1
master_failover_state:no-failover
master_replid:2115cc938dde60196a7d7143336eed2a0e4e69a7
master_replid2:d6b633bc756ded7e6d3fcbc1ad61c7b2b294527f
master_repl_offset:121638329551
second_repl_offset:32425214122
repl_backlog_active:1
repl_backlog_size:256000000
repl_backlog_first_byte_offset:121382325306
repl_backlog_histlen:256004246

# CPU
used_cpu_sys:11178.968550
used_cpu_user:4580.419313
used_cpu_sys_children:2.621047
used_cpu_user_children:6.088434
used_cpu_sys_main_thread:11163.071556
used_cpu_user_main_thread:4579.290099

# Modules

# Commandstats
cmdstat_mget:calls=493818,usec=1757723,usec_per_call=3.56,rejected_calls=0,failed_calls=0
cmdstat_del:calls=56145938,usec=43757068,usec_per_call=0.78,rejected_calls=0,failed_calls=0
cmdstat_llen:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_client|setname:calls=6,usec=11,usec_per_call=1.83,rejected_calls=0,failed_calls=0
cmdstat_hset:calls=19613457,usec=28966922,usec_per_call=1.48,rejected_calls=0,failed_calls=0
cmdstat_object|encoding:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_role:calls=27143446,usec=103856356,usec_per_call=3.83,rejected_calls=0,failed_calls=0
cmdstat_eval:calls=54237403,usec=488279286,usec_per_call=9.00,rejected_calls=0,failed_calls=0
cmdstat_info:calls=588656,usec=71363104,usec_per_call=121.23,rejected_calls=0,failed_calls=0
cmdstat_hgetall:calls=196601,usec=288106753,usec_per_call=1465.44,rejected_calls=0,failed_calls=0
cmdstat_command|docs:calls=3,usec=2313,usec_per_call=771.00,rejected_calls=0,failed_calls=0
cmdstat_scard:calls=2,usec=2,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_replconf:calls=1424447,usec=1911989,usec_per_call=1.34,rejected_calls=0,failed_calls=0
cmdstat_config|set:calls=1,usec=12,usec_per_call=12.00,rejected_calls=0,failed_calls=0
cmdstat_config|get:calls=9,usec=51,usec_per_call=5.67,rejected_calls=0,failed_calls=0
cmdstat_config|rewrite:calls=1,usec=5842,usec_per_call=5842.00,rejected_calls=0,failed_calls=0
cmdstat_subscribe:calls=3,usec=3,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_quit:calls=16050,usec=3644,usec_per_call=0.23,rejected_calls=0,failed_calls=0
cmdstat_dbsize:calls=16,usec=5,usec_per_call=0.31,rejected_calls=0,failed_calls=0
cmdstat_exists:calls=1286663,usec=1486324,usec_per_call=1.16,rejected_calls=0,failed_calls=0
cmdstat_hdel:calls=4796,usec=777434,usec_per_call=162.10,rejected_calls=0,failed_calls=0
cmdstat_set:calls=159266386,usec=379552754,usec_per_call=2.38,rejected_calls=0,failed_calls=0
cmdstat_mset:calls=91172,usec=218643,usec_per_call=2.40,rejected_calls=0,failed_calls=0
cmdstat_setex:calls=12387317,usec=45826111,usec_per_call=3.70,rejected_calls=0,failed_calls=0
cmdstat_scan:calls=133154,usec=1613383,usec_per_call=12.12,rejected_calls=0,failed_calls=0
cmdstat_publish:calls=2106359,usec=11292558,usec_per_call=5.36,rejected_calls=0,failed_calls=0
cmdstat_hmget:calls=19784689,usec=27919585,usec_per_call=1.41,rejected_calls=0,failed_calls=0
cmdstat_expire:calls=73486266,usec=130860480,usec_per_call=1.78,rejected_calls=0,failed_calls=0
cmdstat_sadd:calls=81926,usec=151734,usec_per_call=1.85,rejected_calls=0,failed_calls=0
cmdstat_hget:calls=74,usec=135,usec_per_call=1.82,rejected_calls=0,failed_calls=0
cmdstat_get:calls=134897220,usec=259584551,usec_per_call=1.92,rejected_calls=0,failed_calls=0
cmdstat_type:calls=248748,usec=536221,usec_per_call=2.16,rejected_calls=0,failed_calls=0
cmdstat_rpush:calls=515458,usec=3271857,usec_per_call=6.35,rejected_calls=0,failed_calls=0
cmdstat_select:calls=27148061,usec=11259862,usec_per_call=0.41,rejected_calls=0,failed_calls=0
cmdstat_lpop:calls=1138341,usec=4996212,usec_per_call=4.39,rejected_calls=0,failed_calls=0
cmdstat_hlen:calls=2,usec=1,usec_per_call=0.50,rejected_calls=0,failed_calls=0
cmdstat_keys:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_ttl:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=4151253,usec=4002631,usec_per_call=0.96,rejected_calls=0,failed_calls=0
cmdstat_psync:calls=2,usec=132,usec_per_call=66.00,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_ERR:count=2

# Latencystats
latency_percentiles_usec_mget:p50=3.007,p99=16.063,p99.9=38.143
latency_percentiles_usec_del:p50=1.003,p99=5.023,p99.9=17.023
latency_percentiles_usec_llen:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_client|setname:p50=2.007,p99=2.007,p99.9=2.007
latency_percentiles_usec_hset:p50=1.003,p99=8.031,p99.9=26.111
latency_percentiles_usec_object|encoding:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_role:p50=3.007,p99=17.023,p99.9=43.007
latency_percentiles_usec_eval:p50=7.007,p99=47.103,p99.9=99.327
latency_percentiles_usec_info:p50=56.063,p99=1318.911,p99.9=2572.287
latency_percentiles_usec_hgetall:p50=1286.143,p99=3342.335,p99.9=4030.463
latency_percentiles_usec_command|docs:p50=794.623,p99=815.103,p99.9=815.103
latency_percentiles_usec_scard:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_replconf:p50=1.003,p99=8.031,p99.9=13.055
latency_percentiles_usec_config|set:p50=12.031,p99=12.031,p99.9=12.031
latency_percentiles_usec_config|get:p50=4.015,p99=20.095,p99.9=20.095
latency_percentiles_usec_config|rewrite:p50=5865.471,p99=5865.471,p99.9=5865.471
latency_percentiles_usec_subscribe:p50=0.001,p99=3.007,p99.9=3.007
latency_percentiles_usec_quit:p50=0.001,p99=1.003,p99.9=2.007
latency_percentiles_usec_dbsize:p50=0.001,p99=2.007,p99.9=2.007
latency_percentiles_usec_exists:p50=1.003,p99=7.007,p99.9=20.095
latency_percentiles_usec_hdel:p50=118.271,p99=655.359,p99.9=872.447
latency_percentiles_usec_set:p50=2.007,p99=12.031,p99.9=33.023
latency_percentiles_usec_mset:p50=2.007,p99=12.031,p99.9=34.047
latency_percentiles_usec_setex:p50=3.007,p99=16.063,p99.9=40.191
latency_percentiles_usec_scan:p50=7.007,p99=31.103,p99.9=1581.055
latency_percentiles_usec_publish:p50=5.023,p99=16.063,p99.9=28.031
latency_percentiles_usec_hmget:p50=1.003,p99=8.031,p99.9=33.023
latency_percentiles_usec_expire:p50=1.003,p99=10.047,p99.9=27.007
latency_percentiles_usec_sadd:p50=2.007,p99=9.023,p99.9=25.087
latency_percentiles_usec_hget:p50=1.003,p99=7.007,p99.9=8.031
latency_percentiles_usec_get:p50=1.003,p99=11.007,p99.9=33.023
latency_percentiles_usec_type:p50=2.007,p99=9.023,p99.9=25.087
latency_percentiles_usec_rpush:p50=5.023,p99=29.055,p99.9=108.031
latency_percentiles_usec_select:p50=0.001,p99=3.007,p99.9=14.015
latency_percentiles_usec_lpop:p50=1.003,p99=30.079,p99.9=70.143
latency_percentiles_usec_hlen:p50=0.001,p99=1.003,p99.9=1.003
latency_percentiles_usec_keys:p50=2.007,p99=2.007,p99.9=2.007
latency_percentiles_usec_ttl:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_ping:p50=1.003,p99=2.007,p99.9=16.063
latency_percentiles_usec_psync:p50=61.183,p99=71.167,p99.9=71.167

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=400,expires=393,avg_ttl=261284231
db1:keys=18249,expires=18249,avg_ttl=4158556
db2:keys=246047,expires=246047,avg_ttl=681110
db3:keys=478922,expires=478922,avg_ttl=130206241
db4:keys=3,expires=0,avg_ttl=0
db5:keys=556,expires=542,avg_ttl=52180218
db6:keys=214699,expires=214699,avg_ttl=681619

Additional information Previous deployment was usually filled to around 15G memory

Issue above happened when VM had 23.8G free memory. Changes made to new deployment's VM after looking through other Redis issues: vm.overcommit_memory=1 vm.max_map_count=262144

  1. OS distribution and version - Debian 12 (Bookworm)
  2. 4cpu, 32G RAM, 0 swap

Comment From: madolson

7151:M 29 Nov 2023 23:55:04.068 # Guru Meditation: Redis aborting for OUT OF MEMORY. Allocating 574036 bytes! #server.c:6655

This message is pretty straightforward to interpret. The memory allocator returned NULL back to us, so we interpreted that as out of memory. Can you validate that you don't get the exception when moving to 7.0.11 while on the same OS version?

Comment From: radsed

I can try it, but we are using Debian packages with our automation for the installation, and the only official Debian package for "Bookworm" is 7.0.11, also "Buster" is EOL already hence why we are trying to move directly to Bookworm. I did a little digging and on the older OS version with 6.0.15 the Redis consumes 144 maps. On the new OS version it reached the limit of 262144, can this be caused by the memory allocator? Since it changed from jemalloc-5.1.0 to jemalloc-5.3.0

Comment From: oranagra

so you're saying that even with vm.overcommit_memory=1 the issue still happens? does Redis print any warning about THP on startup?

Comment From: radsed

Yes vm.overcommit_memory is set to 1 and issue still happens. No warning on startup is displayed.

Comment From: oranagra

I can try it, but we are using Debian packages with our automation for the installation, and the only official Debian package for "Bookworm" is 7.0.11, also "Buster" is EOL already hence why we are trying to move directly to Bookworm. I did a little digging and on the older OS version with 6.0.15 the Redis consumes 144 maps. On the new OS version it reached the limit of 262144, can this be caused by the memory allocator? Since it changed from jemalloc-5.1.0 to jemalloc-5.3.0

@jasone can you comment on the above? is it expected that the newer jemalloc will behave that way? is there any info we can extract to learn more about it?

AFAIK the only relevant previous discussion about it in redis is this one: https://github.com/redis/redis/issues/10319#issuecomment-1047739925, leading to #11357

Comment From: jasone

There are three possibilities I can think of:

  • opt.retain has been manually disabled.
  • There's a leak in the application.
  • There's a bug in jemalloc.

Comment From: oranagra

neither of these seems likely.

@radsed maybe you can dig some info before it crashes (when we have far more than 144 maps, but still not enough to panic). * value of /proc/<pid>/maps * MEMORY MALLOC-STATS command

Comment From: radsed

Here is the amount of maps:

cat /proc/3163582/maps | wc -l
18640

Also i have the output here. malloc_stats.txt

Hopefully this is sufficient?

Comment From: oranagra

@jasone can you check if you see something suspicious in that file?

Comment From: judeng

Here is the amount of maps:

cat /proc/3163582/maps | wc -l
18640

Is 3163582 the pid of redis-server? I think the maps output is suspicious, 18640 maybe too big, it's weird, I guess stack or heap sections may have too many fragments.

Comment From: radsed

Yes this is the PID of redis-server:

> ps -ef|grep redis
redis        666       1  0 Nov13 ?        01:24:19 /usr/bin/redis-sentinel
redis    3163582       1  0 Nov30 ?        00:23:24 /usr/bin/redis-server

Comment From: jasone

[...]
  opt.retain: true
[...]
Allocated: 260290168, active: 333275136, metadata: 50658240 (n_thp 0), resident: 383844352, mapped: 391995392, retained: 5549760512

I don't see anything obviously wrong in the jemalloc stats; assuming correct jemalloc internals, even my most pessimistic upper bound estimates would place the number of maps at under 60,000. But that's a much higher upper bound than I expected to come up with because there's a guard page mechanism that was added to jemalloc in 2021 that I know very little about. If there's a bug in the guard page (renamed from guard to san during development), that could explain the map fragmentation, but I'd expect such a flaw to have been discovered and fixed long ago.

CC: @interwq, @Lapenkov

Comment From: interwq

Like Jason mentioned this is certainly not expected. The sanitizer / guard page feature is default off so shouldn't be a factor here. Were the stats gathered when the workload is finished / low? Asking because the retained VM size is >20x of the total allocated bytes, which is too high unless there was a much higher memory peak previously.

@radsed can you also help to get stats with jemalloc 5.1 running the same / similar workload, ideally for similar duration?

Comment From: radsed

Data provided is without any load. We found the issue and rollbacked to old servers, so the data is from new servers with data (crash time - rollback time). Here are map count and stats from active server with jemalloc 5.1.0, Debian 10, Redis 6.0.15

> cat /proc/647/maps | wc -l
145

Malloc-stats: redis_6.txt

Comment From: radsed

We ran a test on our Beta environment which also uses jemalloc 5.3.0, Redis 7.0.11, Debian 12. I tried to manually watch memory and maps usage. The usage of maps was very low (maybe around 260) during testing, but when we reached memory limit for Redis and keys were evicted i saw briefly more than 110000 maps used for redis-server process and then started to come down bit by bit. My assumption is that during eviction/expiration of keys the maps usage goes up, and then slowly comes down, but by the time other keys are expired it won't clear totally and stacks up until it overflows.

Comment From: interwq

@radsed the workload of the two stats are rather different so it's hard to draw any conclusion. A couple of things to check: 1) the older system has opt.thp: "not supported" while the new one supports it. Can you check the value of /sys/kernel/mm/transparent_hugepage/enabled on the new system? In case it's default using THP always. 2) One notable default config change difference between 5.3 and 5.1, is that muzzy decay is enabled by default in 5.1 but not in 5.3. I don't really expect it to affect the mapping count this much. But to test it, you can try adding MALLOC_CONF=muzzy_decay_ms:10000 with the new system when launching the binary, and see if any changes regarding the mappings.

The usage of maps was very low (maybe around 260) during testing, but when we reached memory limit for Redis and keys were evicted i saw briefly more than 110000 maps used for redis-server process and then started to come down bit by bit.

Does the old system exhibit the same pattern as well, or stays low all the time?

Comment From: interwq

@oranagra BTW how wide spread is this issue? Are there other users seeing the same problem? FWIW we haven't seeing anything similar in our internal use cases. Trying to think if it's more likely from different allocator behaviors, or OS / kernel level settings.

Comment From: oranagra

@interwq i haven't seen any similar report. i assume it's related to the system configuration in some way. @judeng any chance you can try the older redis on the newer OS or vice versa? assuming this debian package uses the jemalloc version that's bundled with redis (maybe you can try building both versions yourself)

Comment From: judeng

@oranagra I use wsl2 that can easy install debian 12, but the kernel is 5.15. I'll try it.

Comment From: oranagra

@judeng maybe i tagged you by mistake, and meant to tag @radsed in my previous message.

Comment From: radsed

@oranagra unfortunately as mentioned in our infrastructure we are using automation to have environments basically the same, and since we are using Redis packages that come in Debian repositories that means on Debian 12 we can only use Redis 7.0.11. @interwq The workload should be the same, as stats from new 7.0.11 are after application rollback and stats on 6.0.15 are during normal run. (Same application running) Old system seems to always stay within low numbers no big peaks seen.

Comment From: hao-lee

Hello. We also encountered the same problem in redis-5.0.8. Anyone has new idea about this problem?

Comment From: oranagra

@hao-lee please provide info on the problem you had and the system / configuration you're using, including: * crash log with INFO dump. * do you see any warnings in the redis log file when redis starts?

Comment From: hao-lee

@oranagra Sure! Our system is running CentOS 8.2 with Linux kernel 4.18, and Redis is using jemalloc 5.1.0. Actually, our Redis process hasn't crashed and is still running normally. I just happened to notice that there are 11,097 lines in /proc/pid/maps. After searching on GitHub, I came across this issue.

We have two containers under the same request load, but only one of them is experiencing this problem.

Some lines in maps file:

fe9c0898000-fe9c1517000 rw-p 00000000 00:00 0
fe9c1517000-fe9c1518000 rw-p 00000000 00:00 0
fe9c1518000-fe9c2197000 rw-p 00000000 00:00 0
fe9c2197000-fe9c2198000 rw-p 00000000 00:00 0
fe9c2198000-fe9c2e17000 rw-p 00000000 00:00 0
fe9c2e17000-fe9c2e18000 rw-p 00000000 00:00 0
fe9c2e18000-fe9c3a97000 rw-p 00000000 00:00 0
fe9c3a97000-fe9c3a98000 rw-p 00000000 00:00 0
fe9c3a98000-fe9c4717000 rw-p 00000000 00:00 0
fe9c4717000-fe9c4719000 rw-p 00000000 00:00 0
fe9c4719000-fe9c5397000 rw-p 00000000 00:00 0
fe9c5397000-fe9c5399000 rw-p 00000000 00:00 0
fe9c5399000-fe9c6017000 rw-p 00000000 00:00 0
fe9c6017000-fe9c6019000 rw-p 00000000 00:00 0
fe9c6019000-fe9c6c98000 rw-p 00000000 00:00 0
fe9c6c98000-fe9c6c99000 rw-p 00000000 00:00 0
fe9c6c99000-fe9c7918000 rw-p 00000000 00:00 0
fe9c7918000-fe9c7919000 rw-p 00000000 00:00 0
fe9c7919000-fe9c8598000 rw-p 00000000 00:00 0
fe9c8598000-fe9c8599000 rw-p 00000000 00:00 0
fe9c8599000-fe9c9218000 rw-p 00000000 00:00 0
fe9c9218000-fe9c9219000 rw-p 00000000 00:00 0
fe9c9219000-fe9c9e98000 rw-p 00000000 00:00 0
fe9c9e98000-fe9c9e99000 rw-p 00000000 00:00 0
fe9c9e99000-fe9cab18000 rw-p 00000000 00:00 0
fe9cab18000-fe9cab1a000 rw-p 00000000 00:00 0
fe9cab1a000-fe9cb798000 rw-p 00000000 00:00 0
fe9cb798000-fe9cb79a000 rw-p 00000000 00:00 0
fe9cb79a000-fe9cc418000 rw-p 00000000 00:00 0
....

Comment From: oranagra

@hao-lee what about warnings in redis startup? maybe check for other ideas mentioned in https://github.com/redis/redis/issues/10234 too

Comment From: hao-lee

@oranagra Unfortunately, our server is already set with vm.overcommit_memory to 1, and vm.max_map_count is 65,530, which should be sufficient. So, our issue appears to be the same as this one.

I don't see any warnings in the Redis log.

3485:C 20 Feb 2025 14:09:43.006 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
3485:C 20 Feb 2025 14:09:43.006 # Redis version=5.0.8_4.10, bits=64, commit=7617957b, modified=0, pid=3485, just started
3485:C 20 Feb 2025 14:09:43.006 # Configuration loaded
3486:M 20 Feb 2025 14:09:43.013 # Not listening to IPv6: unsupproted
3486:M 20 Feb 2025 14:09:43.013 * No cluster configuration found, I'm 0752cf7c00e181fabde7aa1429cee8d026b7db69
3486:M 20 Feb 2025 14:09:43.018 # Not listening to IPv6: unsupproted
3486:M 20 Feb 2025 14:09:43.020 * Running mode=cluster, port=6002.
3486:M 20 Feb 2025 14:09:43.020 # Server initialized
3486:M 20 Feb 2025 14:09:43.022 * rdb.index file doesn't exist, will not recover data from aof
3486:M 20 Feb 2025 14:09:43.022 * Ready to accept connections
3486:M 20 Feb 2025 14:09:43.024 * Background saving started by pid 3498
3498:C 20 Feb 2025 14:09:43.024 * write rdb.index, rdb name: dump.rdb, aof name: appendonly-1740031783-1.aof, next opid: 1
3498:C 20 Feb 2025 14:09:43.025 * DB saved on disk file dump.rdb
3498:C 20 Feb 2025 14:09:43.026 * RDB: 0 MB of memory used by copy-on-write
3486:M 20 Feb 2025 14:09:43.127 * Background saving terminated with success

We have stopped processing the request load, and the mapping count is no longer increasing. I will continue investigating this issue. Any ideas or suggestions are welcome!