I have a redis-server, it's memory near maxmemory parameter. it got failover yesterday. i find that it evict 2 million key before failover. do you know why redis evict so many keys suddenly ???
here is sentinel log and redis configration.
sentinel log: [41115] 05 Mar 14:57:16.187 # +new-epoch 9 [41115] 05 Mar 14:57:16.187 # +vote-for-leader aa9178f405ff106d0f476263b7451aaa94cf4b92 9 [41115] 05 Mar 14:57:16.297 # +sdown master xxx6117 10.1.102.76 6117 [41115] 05 Mar 14:57:16.374 # +odown master xxx6117 10.1.102.76 6117 #quorum 3/2 [41115] 05 Mar 14:57:17.440 # +switch-master xxx6117 10.1.102.76 6117 10.1.102.77 6117 [41115] 05 Mar 14:57:17.441 * +slave slave 10.1.102.76:6117 10.1.102.76 6117 @ xxx6117 10.1.102.77 6117 [41115] 05 Mar 14:59:27.368 # +sdown slave 10.1.102.76:6117 10.1.102.76 6117 @ xxx6117 10.1.102.77 6117 [41115] 05 Mar 15:00:08.221 # -sdown slave 10.1.102.76:6117 10.1.102.76 6117 @ xxx6117 10.1.102.77 6117
config get * 1) "dbfilename" 2) "dump6117.rdb" 3) "requirepass" 4) "xxxxx" 5) "masterauth" 6) "xxxxx" 7) "unixsocket" 8) "" 9) "logfile" 10) "/home/xxxx/redis/redis6117/redis.log" 11) "pidfile" 12) "/home/xxxx/redis/redis6117/redis.pid" 13) "maxmemory" 14) "9000000000" 15) "maxmemory-samples" 16) "3" 17) "timeout" 18) "0" 19) "tcp-keepalive" 20) "0" 21) "auto-aof-rewrite-percentage" 22) "100" 23) "auto-aof-rewrite-min-size" 24) "67108864" 25) "hash-max-ziplist-entries" 26) "512" 27) "hash-max-ziplist-value" 28) "512" 29) "list-max-ziplist-entries" 30) "512" 31) "list-max-ziplist-value" 32) "64" 33) "set-max-intset-entries" 34) "512" 35) "zset-max-ziplist-entries" 36) "128" 37) "zset-max-ziplist-value" 38) "64" 39) "lua-time-limit" 40) "5000" 41) "slowlog-log-slower-than" 42) "10000" 43) "slowlog-max-len" 44) "1024" 45) "port" 46) "6117" 47) "tcp-backlog" 48) "511" 49) "databases" 50) "16" 51) "repl-ping-slave-period" 52) "10" 53) "repl-timeout" 54) "60" 55) "repl-backlog-size" 56) "1048576" 57) "repl-backlog-ttl" 58) "3600" 59) "maxclients" 60) "4096" 61) "watchdog-period" 62) "0" 63) "slave-priority" 64) "10" 65) "min-slaves-to-write" 66) "0" 67) "min-slaves-max-lag" 68) "10" 69) "hz" 70) "10" 71) "no-appendfsync-on-rewrite" 72) "yes" 73) "slave-serve-stale-data" 74) "yes" 75) "slave-read-only" 76) "yes" 77) "stop-writes-on-bgsave-error" 78) "yes" 79) "daemonize" 80) "yes" 81) "rdbcompression" 82) "yes" 83) "rdbchecksum" 84) "yes" 85) "activerehashing" 86) "yes" 87) "repl-disable-tcp-nodelay" 88) "no" 89) "aof-rewrite-incremental-fsync" 90) "yes" 91) "appendonly" 92) "no" 93) "dir" 94) "/home/xxxx/redis/redis6117" 95) "maxmemory-policy" 96) "allkeys-lru" 97) "appendfsync" 98) "everysec" 99) "save" 100) "" 101) "loglevel" 102) "notice" 103) "client-output-buffer-limit" 104) "normal 0 0 0 slave 0 0 0 pubsub 33554432 8388608 60" 105) "unixsocketperm" 106) "0" 109) "notify-keyspace-events" 110) ""
Best Regards
Comment From: antirez
Hello, since the failover is orchestrated externally of Redis (when using Sentinel), the new content of the data set will be just whatever the promoted slave content is. So it looks more logical that actually the failover was instead triggered by an event that evicted the 2 millions of keys causing, at the same time, the master instance to block. For example setting a new maxmemory setting via the CONFIG command could cause such an issue. I'm closing this issue for now since there is to understand the root cause of your incident because the failover per se is just a configuration change and is not capable of evicting keys. Please if you find further information try to open a new issue where what happened is outlined and what would be instead the expected behavior. Thank you.
Comment From: dumingyou
actually, there is no mannual command was issued (except app commands), there is only a monitor process which run info and client list command every 30 seconds
Comment From: dumingyou
if a heavy command was issued, a slowlog will loged, but i have not find it in slowlog. there is just some client list command in slowlog . does it cause the 2million key evicted ?
Comment From: haorenfsa
@dumingyou I found it out recently: see #7473