repeatedly Execute bgsave command, Occasionally, the process status is abnormal, such as Z status and D status
bgsave shell
#!/bin/bash
function dataSave()
{
requirepass="xxxxxx"
INSTALL_DEST_DIR="/opt/redis"
cmd="bgsave"
result=`${INSTALL_DEST_DIR}/bin/redis-cli -cipherdir ${INSTALL_DEST_DIR}/cipher/ -h xxxx-p xxxx --no-auth-warning << EOF
AUTH '${requirepass}'
$cmd
EOF`
}
function main()
{
while [ 1 ];do
dataSave
sleep 10
done
}
main
exit 0
check process command
local z_check=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[Zz]' | grep 'redis-server\|redis-sentinel' | awk 'END{print NR}'`
local t_check=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[T]' | grep 'redis-server\|redis-sentinel' | awk 'END{print NR}'`
local d_check=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[D]' | grep 'redis-server\|redis-sentinel' | awk 'END{print NR}'`
the log
[ERROR][21-03-27 03:04:06 /opt/redis-service/health_liveness.sh] check_process_alarm:49 isSentinel: . process check, Z status: 1, T status: 0. [/ERROR]
[ERROR][21-03-27 05:32:21 /opt/redis-service/health_liveness.sh] check_process_alarm:49 isSentinel: . process check, Z status: 1, T status: 0. [/ERROR]
[ERROR][21-03-27 05:32:51 /opt/redis-service/health_liveness.sh] check_process_alarm:49 isSentinel: . process check, Z status: 1, T status: 0. [/ERROR]
Comment From: oranagra
@wangxieliang007 which redis version are you using and on what OS?
Comment From: wangxieliang007
redis 5.0.8 centos x86
Comment From: oranagra
i can't (easily) reproduce this. how long did you run that script? what's the configuration of that redis and how much data it holds? maybe you can improve the script to abort as soon as it finds an anomaly, and / or print the PID of the problematic child process, and then look at the redis log around the time it created it.
Comment From: wangxieliang007
hi this time i test on redis 5.0.11 (centos 7.5 x86). ,it Occurs every 20 minutes. 1 . the bgsave script as follow, test every 10 second
#!/bin/bash
function dataSave()
{
cmd="bgsave"
result=`/usr1/w00347323/redis-5.0.11/src/redis-cli -h 127.0.0.1 -p 6379 << EOF
$cmd
EOF`
}
function main()
{
while [ 1 ];do
dataSave
sleep 10
done
}
main
exit 0
2 . the check script, check every 3 second
#!/bin/bash
function check_process()
{
while [ 1 ];do
sleep 3
local z_check=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[Zz]' | grep 'redis-server\|redis-sentinel' | awk 'END{print NR}'`
local t_check=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[T]' | grep 'redis-server\|redis-sentinel' | awk 'END{print NR}'`
local d_check=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[D]' | grep 'redis-server\|redis-sentinel' | awk 'END{print NR}'`
local s_check=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[S]' | grep 'redis-server\|redis-sentinel' | awk 'END{print NR}'`
local z_result=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[Zz]' | grep 'redis-server\|redis-sentinel'`
local t_result=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[T]' | grep 'redis-server\|redis-sentinel'`
local d_result=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[D]' | grep 'redis-server\|redis-sentinel'`
local s_result=`ps -A -o stat,pid,ppid,user,cmd | grep -v grep | grep -e '^[S]' | grep 'redis-server\|redis-sentinel'`
if [[ ${z_check} -ne 0 || ${t_check} -ne 0 || ${d_check} -ne 0 ]]; then
echo "z_check:${z_check} t_check:${t_check} d_check:${d_check}" `date` >> health.log
echo "z_result:${z_result} t_result:${t_result} d_result:${d_result}" `date` >> health.log
fi
done
}
check_process
exit 0
3 . the health.log
z_check:1 t_check:0 d_check:0 Tue Mar 30 10:21:06 CST 2021
z_result:Z 2573 66041 root [redis-server-5.]
66041:M 30 Mar 2021 10:20:56.911 * Background saving terminated with success
66041:M 30 Mar 2021 10:21:06.855 * Background saving started by pid 2573
2573:C 30 Mar 2021 10:21:06.858 * DB saved on disk
2573:C 30 Mar 2021 10:21:06.858 * RDB: 4 MB of memory used by copy-on-write
66041:M 30 Mar 2021 10:21:06.933 * Background saving terminated with success
66041:M 30 Mar 2021 10:21:16.865 * Background saving started by pid 2911
2911:C 30 Mar 2021 10:21:16.868 * DB saved on disk
2911:C 30 Mar 2021 10:21:16.868 * RDB: 4 MB of memory used by copy-on-write
66041:M 30 Mar 2021 10:21:16.955 * Background saving terminated with success
Comment From: wangxieliang007
i find th z status is the bgsave child process, th pid is 2573,not the redis-server process 66041, Does the abnormal status of the child process have any impact?
Comment From: oranagra
IIRC there were some bugs in that area, but only for diskless-replication (not for normal BGSAVE). if you can, i invite you to test redis 6.2, if we'll see it's solved there, we can look at the diff.