最近项目的看门狗经历了三个版本。
第一个版本:
用ps -ef,如果程序挂了就启动
第二个版本:
程序由于运行时会出现不再监听7901端口,所以不能简单判断机器是不是挂了,而是判断此端口是否有监听
第三个版本:
当7901端口不再监听,就先把原来的killall再启动,每次输出到文件的内容都加日期,要不然根本不知道这事情啥时候发生的
第四个版本:
使用nohup让程序和监控程序的echo输出到非标准设备而是文件,这样彻底脱离shell,从而退出一个shell的时候真正实现后台运行
老版本如下:
#!/bin/sh
set +x
source env.sh
PRMGRAM=scp_platform
FILE_NAME=scp_monitor.log
Current_Time=`date +"%Y-%m-%d %H:%M:%S.%N"`
echo "[${Current_Time}] monitor start...."
echo "[${Current_Time}] monitor start...." >> ${WORK_DIR}/log/${FILE_NAME}
port=7905
TCPListeningnum=`netstat -an | grep ":$port " | awk '$1 == "tcp" && $NF == "LISTEN" {print $0}' | wc -l`
if [ $TCPListeningnum = 1 ]
then
{
echo "[${Current_Time}] The $port is listening"
}
else
{
echo "[${Current_Time}] The port is not listening"
}
fi
while [ 1 ]
do
Current_Time=`date +"%Y-%m-%d %H:%M:%S.%N"`
TCPListeningnum=`netstat -an | grep ":$port " | awk '$1 == "tcp" && $NF == "LISTEN" {print $0}' | wc -l`
if [ $TCPListeningnum = 1 ]
then
{
echo "[${Current_Time}] The ${port} is listening" >> ${WORK_DIR}/log/${FILE_NAME}
}
else
{
echo "[${Current_Time}] The ${port} is not listening" >> ${WORK_DIR}/log/${FILE_NAME}
echo "[${Current_Time}] killall scp_platform now !" >> ${WORK_DIR}/log/${FILE_NAME}
kscp
echo "[${Current_Time}] check ${PRMGRAM} quit, now restart ${PRMGRAM} ..." >> ${WORK_DIR}/log/${FILE_NAME}
scp_platform&
}
fi
sleep 180
done
新版本如下:
start_monitor.sh #此脚本负责将monitor后台运行
#!/bin/bash
#start monitor background without console!!
nohup ./monitor.sh &
monitor.sh #实际的monitor监控程序
#!/bin/bash
set -x
nohup ./env.sh &
PRMGRAM=scp_platform
FILE_NAME=scp_monitor.log
Current_Time=`date +"%Y-%m-%d %H:%M:%S.%N"`
echo "[${Current_Time}] monitor start...."
echo "[${Current_Time}] monitor start...." >> ${WORK_DIR}/log/${FILE_NAME}
port=7905
TCPListeningnum=`netstat -an | grep ":$port " | awk '$1 == "tcp" && $NF == "LISTEN" {print $0}' | wc -l`
if [ $TCPListeningnum = 1 ]
then
{
echo "[${Current_Time}] The $port is listening"
}
else
{
echo "[${Current_Time}] The port is not listening"
}
fi
while [ 1 ]
do
Current_Time=`date +"%Y-%m-%d %H:%M:%S.%N"`
TCPListeningnum=`netstat -an | grep ":$port " | awk '$1 == "tcp" && $NF == "LISTEN" {print $0}' | wc -l`
if [ $TCPListeningnum = 1 ]
then
{
echo "[${Current_Time}] The ${port} is listening" >> ${WORK_DIR}/log/${FILE_NAME}
}
else
{
echo "[${Current_Time}] The ${port} is not listening" >> ${WORK_DIR}/log/${FILE_NAME}
echo "[${Current_Time}] killall scp_platform now !" >> ${WORK_DIR}/log/${FILE_NAME}
killall scp_platform
echo "[${Current_Time}] check ${PRMGRAM} quit, now restart ${PRMGRAM} ..." >> ${WORK_DIR}/log/${FILE_NAME}
nohup scp_platform&
}
fi
sleep 180
done
这里之所以要sleep 180是是因为程序加载实际稍微有点长,要不然加载还没完成的时候是不可以判断有没有监听7905端口的
原来版本的env.sh #无需修改即可使用
env.sh主要是设置环境变量和自定义的变量
#bin/bash
export ROOT=/root/scp
export WORK_DIR=${ROOT}
export INCLUDE=${ROOT}/include
export OTL=${INCLUDE}/otl_mysql
export LD_LIBRARY_PATH=${ROOT}/lib:/usr/local/lib
export ACE_ROOT=${INCLUDE}
export ODBCINI=/usr/local/etc/odbc.ini
export ODBCSYSINI=/usr/local/etc
PATH=${PATH}:${ROOT}/bin
export PATH
odbcinst -j
alias wk='cd ${ROOT}'
alias bin='cd ${ROOT}/bin'
alias cfg='cd ${ROOT}/conf'
alias rmlog='rm -rf ${ROOT}/bin/log*.*; rm -rf ${ROOT}/log/*.*'
alias lis='netstat -an|grep -i 7905'
alias scp='${ROOT}/bin/scp_platform &'
alias moni='${ROOT}/bin/monitor.sh &'
alias myps='ps -fu root|grep -v grep|grep -i scp'
alias mymoni='ps -fu root|grep -v grep|grep -i moni'
alias kscp='killall -9 scp_platform'
alias kmoni='killall -9 monitor.sh'
isql
alias mynet='netstat -an | grep 7905'
ulimit -c unlimited
ulimit -n 65530