zoukankan      html  css  js  c++  java
  • Python检测服务端口存活状态并报警

    最近发现公司的测试环境中有个Socket服务的端口总是莫名其妙Down掉,但是服务却正常运行着,看样子是僵死了。。。

    虽然是测试环境,但是也不能这样放着不管,于是连夜写了一个简单的监控脚本。因为服务器是Windows的,所以要用到wmi模块。逻辑如下:

    1、使用CMD命令"net start"获取系统中处于运行状态的服务,将这些服务名称生成一个列表。

    2、判断监控的服务是否存在于列表中,如果不存在说明服务已经停止,那么将尝试启动服务,并发送报警邮件。

    3、向本地的Socket服务端口发送一个connect,如果捕获到异常将尝试重启服务,并发送报警邮件。

    4、每次执行时脚本将会循环执行以上步骤两次,间隔10秒,以确保服务状态正常。

    在运行的时候发现了一个问题,Python使用wmi模块来对Windows系统进行操作的时候速度格外的慢,不知道有没有其他的代替方法,哪位如果有更好的方法可以指点一下。

    更新:用Windows CMD命令"net start"代替了wmi模块获取运行中的服务名列表。

    源码如下:

    #!/usr/bin/env python
    
    import os
    import wmi
    import time
    import socket
    import smtplib
    import logging
    from email.mime.text import MIMEText
    
    
    def get_stop_service(designation):
        """To obtain a list of running the service name,
        check whether the monitoring server is present in the list.
        """
        lines = os.popen('net start').readlines()
        line = [item.strip() for item in [i for i in lines]]
        if designation in line:
            return True
        else:
            logging.error('Service [%s] is down, try to restart the service. 
    ' % designation)
            return False
    
    def monitor(sname):
        """Send the machine IP port 20000 socket request,
        If capture the abnormal returns false.
        """
        s = socket.socket()
        s.settimeout(3)  # timeout
        host = ('127.0.0.1', 20000)
        try:  # Try connection to the host
            s.connect(host)
        except socket.error as e:
            logging.warning('[%s] service connection failed: %s 
    ' % (sname, e))
            return False
        return True
    
    
    def restart_service(rstname, conn, run):
        """First check whether the service is stopped,
        if stop, start the service directly.
        The check whether the zombies,
        if a zombie, then restart the service.
        """
        flag = False
        try:
            # From get_stop_service() to obtain the return value, the return value
            if not run:
                ret = os.system('sc start "%s"' % rstname)
                if ret != 0:
                    raise Exception('[Errno %s]' % ret)
                flag = True
            elif not conn:
                retStop = os.system('sc stop "%s"' % rstname)
                retSart = os.system('sc start "%s"' % rstname)
                if retSart != 0:
                    raise Exception('retStop [Status code %s] '
                                    'retSart [Status code %s] ' % (retStop, retSart))
                flag = True
            else:
                logging.info('[%s] service running status to normal' % rstname)
                return True
        except Exception as e:
            logging.warning('[%s] service restart failed: %s 
    ' % (rstname, e))
            return flag
    
    
    def send_mail(to_list, sub, contents):
        """
        Send alarm mail.
        """
        mail_server = 'mail.stmp.com'  # STMP Server
        mail_user = 'YouAccount'  # Mail account
        mail_pass = 'Password'  # password
        mail_postfix = 'smtp.com'  # Domain name
    
        me = 'Monitor alarm<%s@%s>' % (mail_user, mail_postfix)
        message = MIMEText(contents, _subtype='html', _charset='utf-8')
    
        message['Subject'] = sub
        message['From'] = me
        message['To'] = ';'.join(to_list)
    
        flag = False  # To determine whether a mail sent successfully
        try:
            s = smtplib.SMTP()
            s.connect(mail_server)
            s.login(mail_user, mail_pass)
            s.sendmail(me, to_list, message.as_string())
            s.close()
            flag = True
        except Exception, e:
            logging.warning('Send mail failed, exception: [%s]. 
    ' % e)
    
        return flag
    
    
    def main(sname):
        """Parameter type in the name of the service need to monitor,
        perform functions defined in turn, and the return value is correct.
        After the program is running, will test two times,
        each time interval to 10 seconds.
        """
        retry = 2
        count = 0
        retval = False  # Used return to the state of the socket
        while count < retry:
            ret = monitor(sname)
            if not ret:  # If socket connection is normaol, return retval
                retval = ret
                return retval
            isDown = get_stop_service(sname)
            restart_service(rstname=sname, conn=ret, run=isDown)
    
            host = socket.gethostname()
            address = socket.gethostbyname(host)
            mailto_list = ['mail@smtp.com', ]  # Alarm contacts
            send_mail(mailto_list,
                      'Alarm',
                      ' <h4>Level: <u>ERROR</u></br> Host name: %s</br>'
                      ' IP Address: %s</br>'
                      ' Service name:</h4> <h5>%s</h5>'
                      % (host, address, sname))
            count += 1
            time.sleep(10)
        else:
            logging.error('[%s] service try to restart more than three times 
    ' % sname)
    
        return retval
    
    
    if __name__ == '__main__':
    
        logging.basicConfig(level=logging.INFO,
                            format='%(asctime)s %(levelname)s %(message)s',
                            datefmt='%Y/%m/%d %H:%M:%S',
                            filename='D:\logs\Monitor.log',
                            filemode='ab')
    
        name = 'Service Name'
        response = main(name)
        if response:
            logging.info('The [%s] service connection is normal 
    ' % name)

    以上代码还是有可以改进的地方,将多个服务名写到文件中,程序去读取文件中的服务依次进行检测。

  • 相关阅读:
    Assembly介绍
    How to be a Programmer
    (转) 展望未来,总结过去10年的程序员生涯,给程序员小弟弟小妹妹们的一些总结性忠告
    ClientScript.RegisterStartupScript()
    sql server日期时间转字符串
    GridView 全选
    C# 获取xml里的值
    web 点击按钮,根据点击确认进行下一步操作
    字符串宽相同
    FormClosing
  • 原文地址:https://www.cnblogs.com/XuHoo/p/5888507.html
Copyright © 2011-2022 走看看