zoukankan      html  css  js  c++  java
  • Open-Falcon 监控系统监控 MySQL/Redis/MongoDB 状态监控

    背景:

    Open-Falcon 是小米运维部开源的一款互联网企业级监控系统解决方案,具体的安装和使用说明请见官网:http://open-falcon.org/,是一款比较全的监控。而且提供各种API,只需要把数据按照规定给出就能出图,以及报警、集群支持等等。

    监控:

    1) MySQL 收集信息脚本(mysql_monitor.py)

    复制代码
    #!/bin/env python
    # -*- encoding: utf-8 -*-
    
    from __future__ import division
    import MySQLdb
    import datetime
    import time
    import os
    import sys
    import fileinput
    import requests
    import json
    import re
    
    
    class MySQLMonitorInfo():
    
        def __init__(self,host,port,user,password):
            self.host     = host
            self.port     = port
            self.user     = user
            self.password = password
    
        def stat_info(self):
            try:
                m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')
                query = "SHOW GLOBAL STATUS"
                cursor = m.cursor()
                cursor.execute(query)
                Str_string = cursor.fetchall()
                Status_dict = {}
                for Str_key,Str_value in Str_string:
                    Status_dict[Str_key] = Str_value
                cursor.close()
                m.close()
                return Status_dict
    
            except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                Status_dict = {}
                return Status_dict 
    
        def engine_info(self):
            try:
                m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')
                _engine_regex = re.compile(ur'(History list length) ([0-9]+.?[0-9]*)
    ')
                query = "SHOW ENGINE INNODB STATUS"
                cursor = m.cursor()
                cursor.execute(query)
                Str_string = cursor.fetchone()
                a,b,c = Str_string
                cursor.close()
                m.close()
                return dict(_engine_regex.findall(c))
            except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                return dict(History_list_length=0)
    
    if __name__ == '__main__':
    
        open_falcon_api = 'http://192.168.200.86:1988/v1/push'
    
        db_list= []
        for line in fileinput.input():
            db_list.append(line.strip())
        for db_info in db_list:
    #        host,port,user,password,endpoint,metric = db_info.split(',')
            host,port,user,password,endpoint = db_info.split(',')
    
            timestamp = int(time.time())
            step      = 60
    #        tags      = "port=%s" %port
            tags      = ""
    
            conn = MySQLMonitorInfo(host,int(port),user,password)
            stat_info = conn.stat_info()
            engine_info = conn.engine_info()
    
            mysql_stat_list = []
            monitor_keys = [
                ('Com_select','COUNTER'),
                ('Qcache_hits','COUNTER'),
                ('Com_insert','COUNTER'),
                ('Com_update','COUNTER'),
                ('Com_delete','COUNTER'),
                ('Com_replace','COUNTER'),
                ('MySQL_QPS','COUNTER'),
                ('MySQL_TPS','COUNTER'),
                ('ReadWrite_ratio','GAUGE'),
                ('Innodb_buffer_pool_read_requests','COUNTER'),
                ('Innodb_buffer_pool_reads','COUNTER'),
                ('Innodb_buffer_read_hit_ratio','GAUGE'),
                ('Innodb_buffer_pool_pages_flushed','COUNTER'),
                ('Innodb_buffer_pool_pages_free','GAUGE'),
                ('Innodb_buffer_pool_pages_dirty','GAUGE'),
                ('Innodb_buffer_pool_pages_data','GAUGE'),
                ('Bytes_received','COUNTER'),
                ('Bytes_sent','COUNTER'),
                ('Innodb_rows_deleted','COUNTER'),
                ('Innodb_rows_inserted','COUNTER'),
                ('Innodb_rows_read','COUNTER'),
                ('Innodb_rows_updated','COUNTER'),
                ('Innodb_os_log_fsyncs','COUNTER'),
                ('Innodb_os_log_written','COUNTER'),
                ('Created_tmp_disk_tables','COUNTER'),
                ('Created_tmp_tables','COUNTER'),
                ('Connections','COUNTER'),
                ('Innodb_log_waits','COUNTER'),
                ('Slow_queries','COUNTER'),
                ('Binlog_cache_disk_use','COUNTER')
            ]
    
            for _key,falcon_type in monitor_keys:
                if _key == 'MySQL_QPS':
                    _value = int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0))
                elif _key == 'MySQL_TPS':
                    _value = int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))
                elif _key == 'Innodb_buffer_read_hit_ratio':
                    try:
                        _value = round((int(stat_info.get('Innodb_buffer_pool_read_requests',0)) - int(stat_info.get('Innodb_buffer_pool_reads',0)))/int(stat_info.get('Innodb_buffer_pool_read_requests',0)) * 100,3)
                    except ZeroDivisionError:
                        _value = 0
                elif _key == 'ReadWrite_ratio':
                    try:
                        _value = round((int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0)))/(int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))),2)
                    except ZeroDivisionError:
                        _value = 0            
                else:
                    _value = int(stat_info.get(_key,0))
    
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': _value,
                        'CounterType': falcon_type,
                        'TAGS': tags
                    }
                mysql_stat_list.append(falcon_format)
    
            #_key : History list length
            for _key,_value in  engine_info.items():
                _key = "Undo_Log_Length"
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': int(_value),
                        'CounterType': "GAUGE",
                        'TAGS': tags
                    }
                mysql_stat_list.append(falcon_format)
    
            print json.dumps(mysql_stat_list,sort_keys=True,indent=4)
            requests.post(open_falcon_api, data=json.dumps(mysql_stat_list))
    复制代码

    指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

    指标 类型 说明
     Undo_Log_Length  GAUGE 未清除的Undo事务数
     Com_select  COUNTER  select/秒=QPS
     Com_insert  COUNTER  insert/秒
     Com_update  COUNTER  update/秒
     Com_delete  COUNTER  delete/秒
     Com_replace  COUNTER  replace/秒
     MySQL_QPS  COUNTER  QPS
     MySQL_TPS  COUNTER  TPS 
     ReadWrite_ratio  GAUGE  读写比例
     Innodb_buffer_pool_read_requests  COUNTER  innodb buffer pool 读次数/秒
     Innodb_buffer_pool_reads  COUNTER  Disk 读次数/秒
     Innodb_buffer_read_hit_ratio  GAUGE  innodb buffer pool 命中率
     Innodb_buffer_pool_pages_flushed  COUNTER  innodb buffer pool 刷写到磁盘的页数/秒
     Innodb_buffer_pool_pages_free  GAUGE  innodb buffer pool 空闲页的数量
     Innodb_buffer_pool_pages_dirty  GAUGE  innodb buffer pool 脏页的数量
     Innodb_buffer_pool_pages_data  GAUGE  innodb buffer pool 数据页的数量
     Bytes_received  COUNTER  接收字节数/秒
     Bytes_sent  COUNTER  发送字节数/秒
     Innodb_rows_deleted  COUNTER  innodb表删除的行数/秒
     Innodb_rows_inserted  COUNTER   innodb表插入的行数/秒
     Innodb_rows_read  COUNTER   innodb表读取的行数/秒
     Innodb_rows_updated   COUNTER   innodb表更新的行数/秒
     Innodb_os_log_fsyncs  COUNTER   Redo Log fsync次数/秒 
     Innodb_os_log_written  COUNTER   Redo Log 写入的字节数/秒
     Created_tmp_disk_tables  COUNTER   创建磁盘临时表的数量/秒
     Created_tmp_tables  COUNTER   创建内存临时表的数量/秒
     Connections  COUNTER   连接数/秒
     Innodb_log_waits  COUNTER   innodb log buffer不足等待的数量/秒
     Slow_queries  COUNTER   慢查询数/秒
     Binlog_cache_disk_use  COUNTER   Binlog Cache不足的数量/秒

    使用说明:读取配置到都数据库列表执行,配置文件格式如下(mysqldb_list.txt):

     IP,Port,User,Password,endpoint

    192.168.2.21,3306,root,123,mysql-21:3306
    192.168.2.88,3306,root,123,mysql-88:3306

    最后执行:

    python mysql_monitor.py mysqldb_list.txt 

    2) Redis 收集信息脚本(redis_monitor.py)

    复制代码
    #!/bin/env python
    #-*- coding:utf-8 -*-
    
    import json
    import time
    import re
    import redis
    import requests
    import fileinput
    import datetime
    
    class RedisMonitorInfo():
    
        def __init__(self,host,port,password):
            self.host     = host
            self.port     = port
            self.password = password
    
        def stat_info(self):
             try:
                r = redis.Redis(host=self.host, port=self.port, password=self.password)
                stat_info = r.info()
                return stat_info
             except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                return dict()
    
        def cmdstat_info(self):
            try:
                r = redis.Redis(host=self.host, port=self.port, password=self.password)
                cmdstat_info = r.info('Commandstats')
                return cmdstat_info
            except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                return dict()
    
    if __name__ == '__main__':
    
        open_falcon_api = 'http://192.168.200.86:1988/v1/push'
    
        db_list= []
        for line in fileinput.input():
            db_list.append(line.strip())
        for db_info in db_list:
    #        host,port,password,endpoint,metric = db_info.split(',')
            host,port,password,endpoint = db_info.split(',')
    
            timestamp = int(time.time())
            step      = 60
            falcon_type = 'COUNTER'
    #        tags      = "port=%s" %port
            tags      = ""
        
            conn = RedisMonitorInfo(host,port,password)
        
            #查看各个命令每秒执行次数
            redis_cmdstat_dict = {}
            redis_cmdstat_list = []
            cmdstat_info = conn.cmdstat_info()
            for cmdkey in cmdstat_info:
                redis_cmdstat_dict[cmdkey] = cmdstat_info[cmdkey]['calls']
            for _key,_value in redis_cmdstat_dict.items():
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': int(_value),
                        'CounterType': falcon_type,
                        'TAGS': tags
                    }
                redis_cmdstat_list.append(falcon_format)
        
            #查看Redis各种状态,根据需要增删监控项,str的值需要转换成int
            redis_stat_list = []
            monitor_keys = [
                ('connected_clients','GAUGE'),
                ('blocked_clients','GAUGE'),
                ('used_memory','GAUGE'),
                ('used_memory_rss','GAUGE'),
                ('mem_fragmentation_ratio','GAUGE'),
                ('total_commands_processed','COUNTER'),
                ('rejected_connections','COUNTER'),
                ('expired_keys','COUNTER'),
                ('evicted_keys','COUNTER'),
                ('keyspace_hits','COUNTER'),
                ('keyspace_misses','COUNTER'),
                ('keyspace_hit_ratio','GAUGE'),
                ('keys_num','GAUGE'),
            ]
            stat_info = conn.stat_info()   
            for _key,falcon_type in monitor_keys:
                #计算命中率
                if _key == 'keyspace_hit_ratio':
                    try:
                        _value = round(float(stat_info.get('keyspace_hits',0))/(int(stat_info.get('keyspace_hits',0)) + int(stat_info.get('keyspace_misses',0))),4)*100
                    except ZeroDivisionError:
                        _value = 0
                #碎片率是浮点数
                elif _key == 'mem_fragmentation_ratio':
                    _value = float(stat_info.get(_key,0))
                #拿到key的数量
                elif _key == 'keys_num':
                    _value = 0 
                    for i in range(16):
                        _key = 'db'+str(i)
                        _num = stat_info.get(_key)
                        if _num:
                            _value += int(_num.get('keys'))
                    _key = 'keys_num'
                #其他的都采集成counter,int
                else:
                    try:
                        _value = int(stat_info[_key])
                    except:
                        continue
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': _value,
                        'CounterType': falcon_type,
                        'TAGS': tags
                    }
                redis_stat_list.append(falcon_format)
        
            load_data = redis_stat_list+redis_cmdstat_list
            print json.dumps(load_data,sort_keys=True,indent=4)
            requests.post(open_falcon_api, data=json.dumps(load_data))
    复制代码

    指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

    指标 类型 说明
     connected_clients  GAUGE 连接的客户端个数
     blocked_clients  GAUGE 被阻塞客户端的数量
     used_memory  GAUGE  Redis分配的内存的总量
     used_memory_rss  GAUGE  OS分配的内存的总量
     mem_fragmentation_ratio  GAUGE  内存碎片率,used_memory_rss/used_memory
     total_commands_processed  COUNTER  每秒执行的命令数,比较准确的QPS
     rejected_connections  COUNTER  被拒绝的连接数/秒
     expired_keys  COUNTER  过期KEY的数量/秒 
     evicted_keys  COUNTER  被驱逐KEY的数量/秒
     keyspace_hits  COUNTER  命中KEY的数量/秒
     keyspace_misses  COUNTER  未命中KEY的数量/秒
     keyspace_hit_ratio  GAUGE  KEY的命中率
     keys_num  GAUGE  KEY的数量
     cmd_*  COUNTER  各种名字都执行次数/秒

    使用说明:读取配置到都数据库列表执行,配置文件格式如下(redisdb_list.txt):

     IP,Port,Password,endpoint

    192.168.1.56,7021,zhoujy,redis-56:7021
    192.168.1.55,7021,zhoujy,redis-55:7021

    最后执行:

     python redis_monitor.py redisdb_list.txt

    3) MongoDB 收集信息脚本(mongodb_monitor.py)

    ...后续添加

    4)其他相关的监控(需要装上agent),比如下面的指标:

    告警项触发条件备注
    load.1min all(#3)>10 Redis服务器过载,处理能力下降
    cpu.idle all(#3)<10 CPU idle过低,处理能力下降
    df.bytes.free.percent all(#3)<20 磁盘可用空间百分比低于20%,影响从库RDB和AOF持久化
    mem.memfree.percent all(#3)<15 内存剩余低于15%,Redis有OOM killer和使用swap的风险
    mem.swapfree.percent all(#3)<80 使用20% swap,Redis性能下降或OOM风险
    net.if.out.bytes all(#3)>94371840 网络出口流量超90MB,影响Redis响应
    net.if.in.bytes all(#3)>94371840 网络入口流量超90MB,影响Redis响应
    disk.io.util all(#3)>90 磁盘IO可能存负载,影响从库持久化和阻塞写

     

    相关文档:

    https://github.com/iambocai/falcon-monit-scripts(redis monitor)

    https://github.com/ZhuoRoger/redismon(redis monitor)

    https://www.cnblogs.com/zhoujinyi/p/6645104.html

  • 相关阅读:
    开源项目
    [Accessibility] Missing contentDescription attribute on image [可取行]失踪contentDescription属性图像
    Android 布局 中实现适应屏幕大小及组件滚动
    EF 错误记录
    EasyUI 加载时需要显示和隐藏 panel(面板)内容破版问题
    IE 报表缩放后页面破版
    VS 2017 引入nuget 问题
    SSRS 报表显示页面 asp net session丢失或者找不到 asp net session has expired or could not be found()
    log4net 配置
    网站
  • 原文地址:https://www.cnblogs.com/jackyzm/p/9600496.html
Copyright © 2011-2022 走看看