zoukankan      html  css  js  c++  java
  • Open-Falcon 监控系统监控 MySQL/Redis/MongoDB 状态监控

    背景:

    Open-Falcon 是小米运维部开源的一款互联网企业级监控系统解决方案,具体的安装和使用说明请见官网:http://open-falcon.org/,是一款比较全的监控。而且提供各种API,只需要把数据按照规定给出就能出图,以及报警、集群支持等等。

    监控:

    1) MySQL 收集信息脚本(mysql_monitor.py)

    复制代码
    #!/bin/env python
    # -*- encoding: utf-8 -*-
    
    from __future__ import division
    import MySQLdb
    import datetime
    import time
    import os
    import sys
    import fileinput
    import requests
    import json
    import re
    
    
    class MySQLMonitorInfo():
    
        def __init__(self,host,port,user,password):
            self.host     = host
            self.port     = port
            self.user     = user
            self.password = password
    
        def stat_info(self):
            try:
                m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')
                query = "SHOW GLOBAL STATUS"
                cursor = m.cursor()
                cursor.execute(query)
                Str_string = cursor.fetchall()
                Status_dict = {}
                for Str_key,Str_value in Str_string:
                    Status_dict[Str_key] = Str_value
                cursor.close()
                m.close()
                return Status_dict
    
            except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                Status_dict = {}
                return Status_dict 
    
        def engine_info(self):
            try:
                m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')
                _engine_regex = re.compile(ur'(History list length) ([0-9]+.?[0-9]*)
    ')
                query = "SHOW ENGINE INNODB STATUS"
                cursor = m.cursor()
                cursor.execute(query)
                Str_string = cursor.fetchone()
                a,b,c = Str_string
                cursor.close()
                m.close()
                return dict(_engine_regex.findall(c))
            except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                return dict(History_list_length=0)
    
    if __name__ == '__main__':
    
        open_falcon_api = 'http://192.168.200.86:1988/v1/push'
    
        db_list= []
        for line in fileinput.input():
            db_list.append(line.strip())
        for db_info in db_list:
    #        host,port,user,password,endpoint,metric = db_info.split(',')
            host,port,user,password,endpoint = db_info.split(',')
    
            timestamp = int(time.time())
            step      = 60
    #        tags      = "port=%s" %port
            tags      = ""
    
            conn = MySQLMonitorInfo(host,int(port),user,password)
            stat_info = conn.stat_info()
            engine_info = conn.engine_info()
    
            mysql_stat_list = []
            monitor_keys = [
                ('Com_select','COUNTER'),
                ('Qcache_hits','COUNTER'),
                ('Com_insert','COUNTER'),
                ('Com_update','COUNTER'),
                ('Com_delete','COUNTER'),
                ('Com_replace','COUNTER'),
                ('MySQL_QPS','COUNTER'),
                ('MySQL_TPS','COUNTER'),
                ('ReadWrite_ratio','GAUGE'),
                ('Innodb_buffer_pool_read_requests','COUNTER'),
                ('Innodb_buffer_pool_reads','COUNTER'),
                ('Innodb_buffer_read_hit_ratio','GAUGE'),
                ('Innodb_buffer_pool_pages_flushed','COUNTER'),
                ('Innodb_buffer_pool_pages_free','GAUGE'),
                ('Innodb_buffer_pool_pages_dirty','GAUGE'),
                ('Innodb_buffer_pool_pages_data','GAUGE'),
                ('Bytes_received','COUNTER'),
                ('Bytes_sent','COUNTER'),
                ('Innodb_rows_deleted','COUNTER'),
                ('Innodb_rows_inserted','COUNTER'),
                ('Innodb_rows_read','COUNTER'),
                ('Innodb_rows_updated','COUNTER'),
                ('Innodb_os_log_fsyncs','COUNTER'),
                ('Innodb_os_log_written','COUNTER'),
                ('Created_tmp_disk_tables','COUNTER'),
                ('Created_tmp_tables','COUNTER'),
                ('Connections','COUNTER'),
                ('Innodb_log_waits','COUNTER'),
                ('Slow_queries','COUNTER'),
                ('Binlog_cache_disk_use','COUNTER')
            ]
    
            for _key,falcon_type in monitor_keys:
                if _key == 'MySQL_QPS':
                    _value = int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0))
                elif _key == 'MySQL_TPS':
                    _value = int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))
                elif _key == 'Innodb_buffer_read_hit_ratio':
                    try:
                        _value = round((int(stat_info.get('Innodb_buffer_pool_read_requests',0)) - int(stat_info.get('Innodb_buffer_pool_reads',0)))/int(stat_info.get('Innodb_buffer_pool_read_requests',0)) * 100,3)
                    except ZeroDivisionError:
                        _value = 0
                elif _key == 'ReadWrite_ratio':
                    try:
                        _value = round((int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0)))/(int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))),2)
                    except ZeroDivisionError:
                        _value = 0            
                else:
                    _value = int(stat_info.get(_key,0))
    
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': _value,
                        'CounterType': falcon_type,
                        'TAGS': tags
                    }
                mysql_stat_list.append(falcon_format)
    
            #_key : History list length
            for _key,_value in  engine_info.items():
                _key = "Undo_Log_Length"
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': int(_value),
                        'CounterType': "GAUGE",
                        'TAGS': tags
                    }
                mysql_stat_list.append(falcon_format)
    
            print json.dumps(mysql_stat_list,sort_keys=True,indent=4)
            requests.post(open_falcon_api, data=json.dumps(mysql_stat_list))
    复制代码

    指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

    指标 类型 说明
     Undo_Log_Length  GAUGE 未清除的Undo事务数
     Com_select  COUNTER  select/秒=QPS
     Com_insert  COUNTER  insert/秒
     Com_update  COUNTER  update/秒
     Com_delete  COUNTER  delete/秒
     Com_replace  COUNTER  replace/秒
     MySQL_QPS  COUNTER  QPS
     MySQL_TPS  COUNTER  TPS 
     ReadWrite_ratio  GAUGE  读写比例
     Innodb_buffer_pool_read_requests  COUNTER  innodb buffer pool 读次数/秒
     Innodb_buffer_pool_reads  COUNTER  Disk 读次数/秒
     Innodb_buffer_read_hit_ratio  GAUGE  innodb buffer pool 命中率
     Innodb_buffer_pool_pages_flushed  COUNTER  innodb buffer pool 刷写到磁盘的页数/秒
     Innodb_buffer_pool_pages_free  GAUGE  innodb buffer pool 空闲页的数量
     Innodb_buffer_pool_pages_dirty  GAUGE  innodb buffer pool 脏页的数量
     Innodb_buffer_pool_pages_data  GAUGE  innodb buffer pool 数据页的数量
     Bytes_received  COUNTER  接收字节数/秒
     Bytes_sent  COUNTER  发送字节数/秒
     Innodb_rows_deleted  COUNTER  innodb表删除的行数/秒
     Innodb_rows_inserted  COUNTER   innodb表插入的行数/秒
     Innodb_rows_read  COUNTER   innodb表读取的行数/秒
     Innodb_rows_updated   COUNTER   innodb表更新的行数/秒
     Innodb_os_log_fsyncs  COUNTER   Redo Log fsync次数/秒 
     Innodb_os_log_written  COUNTER   Redo Log 写入的字节数/秒
     Created_tmp_disk_tables  COUNTER   创建磁盘临时表的数量/秒
     Created_tmp_tables  COUNTER   创建内存临时表的数量/秒
     Connections  COUNTER   连接数/秒
     Innodb_log_waits  COUNTER   innodb log buffer不足等待的数量/秒
     Slow_queries  COUNTER   慢查询数/秒
     Binlog_cache_disk_use  COUNTER   Binlog Cache不足的数量/秒

    使用说明:读取配置到都数据库列表执行,配置文件格式如下(mysqldb_list.txt):

     IP,Port,User,Password,endpoint

    192.168.2.21,3306,root,123,mysql-21:3306
    192.168.2.88,3306,root,123,mysql-88:3306

    最后执行:

    python mysql_monitor.py mysqldb_list.txt 

    2) Redis 收集信息脚本(redis_monitor.py)

    复制代码
    #!/bin/env python
    #-*- coding:utf-8 -*-
    
    import json
    import time
    import re
    import redis
    import requests
    import fileinput
    import datetime
    
    class RedisMonitorInfo():
    
        def __init__(self,host,port,password):
            self.host     = host
            self.port     = port
            self.password = password
    
        def stat_info(self):
             try:
                r = redis.Redis(host=self.host, port=self.port, password=self.password)
                stat_info = r.info()
                return stat_info
             except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                return dict()
    
        def cmdstat_info(self):
            try:
                r = redis.Redis(host=self.host, port=self.port, password=self.password)
                cmdstat_info = r.info('Commandstats')
                return cmdstat_info
            except Exception, e:
                print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
                print e
                return dict()
    
    if __name__ == '__main__':
    
        open_falcon_api = 'http://192.168.200.86:1988/v1/push'
    
        db_list= []
        for line in fileinput.input():
            db_list.append(line.strip())
        for db_info in db_list:
    #        host,port,password,endpoint,metric = db_info.split(',')
            host,port,password,endpoint = db_info.split(',')
    
            timestamp = int(time.time())
            step      = 60
            falcon_type = 'COUNTER'
    #        tags      = "port=%s" %port
            tags      = ""
        
            conn = RedisMonitorInfo(host,port,password)
        
            #查看各个命令每秒执行次数
            redis_cmdstat_dict = {}
            redis_cmdstat_list = []
            cmdstat_info = conn.cmdstat_info()
            for cmdkey in cmdstat_info:
                redis_cmdstat_dict[cmdkey] = cmdstat_info[cmdkey]['calls']
            for _key,_value in redis_cmdstat_dict.items():
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': int(_value),
                        'CounterType': falcon_type,
                        'TAGS': tags
                    }
                redis_cmdstat_list.append(falcon_format)
        
            #查看Redis各种状态,根据需要增删监控项,str的值需要转换成int
            redis_stat_list = []
            monitor_keys = [
                ('connected_clients','GAUGE'),
                ('blocked_clients','GAUGE'),
                ('used_memory','GAUGE'),
                ('used_memory_rss','GAUGE'),
                ('mem_fragmentation_ratio','GAUGE'),
                ('total_commands_processed','COUNTER'),
                ('rejected_connections','COUNTER'),
                ('expired_keys','COUNTER'),
                ('evicted_keys','COUNTER'),
                ('keyspace_hits','COUNTER'),
                ('keyspace_misses','COUNTER'),
                ('keyspace_hit_ratio','GAUGE'),
                ('keys_num','GAUGE'),
            ]
            stat_info = conn.stat_info()   
            for _key,falcon_type in monitor_keys:
                #计算命中率
                if _key == 'keyspace_hit_ratio':
                    try:
                        _value = round(float(stat_info.get('keyspace_hits',0))/(int(stat_info.get('keyspace_hits',0)) + int(stat_info.get('keyspace_misses',0))),4)*100
                    except ZeroDivisionError:
                        _value = 0
                #碎片率是浮点数
                elif _key == 'mem_fragmentation_ratio':
                    _value = float(stat_info.get(_key,0))
                #拿到key的数量
                elif _key == 'keys_num':
                    _value = 0 
                    for i in range(16):
                        _key = 'db'+str(i)
                        _num = stat_info.get(_key)
                        if _num:
                            _value += int(_num.get('keys'))
                    _key = 'keys_num'
                #其他的都采集成counter,int
                else:
                    try:
                        _value = int(stat_info[_key])
                    except:
                        continue
                falcon_format = {
                        'Metric': '%s' % (_key),
                        'Endpoint': endpoint,
                        'Timestamp': timestamp,
                        'Step': step,
                        'Value': _value,
                        'CounterType': falcon_type,
                        'TAGS': tags
                    }
                redis_stat_list.append(falcon_format)
        
            load_data = redis_stat_list+redis_cmdstat_list
            print json.dumps(load_data,sort_keys=True,indent=4)
            requests.post(open_falcon_api, data=json.dumps(load_data))
    复制代码

    指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

    指标 类型 说明
     connected_clients  GAUGE 连接的客户端个数
     blocked_clients  GAUGE 被阻塞客户端的数量
     used_memory  GAUGE  Redis分配的内存的总量
     used_memory_rss  GAUGE  OS分配的内存的总量
     mem_fragmentation_ratio  GAUGE  内存碎片率,used_memory_rss/used_memory
     total_commands_processed  COUNTER  每秒执行的命令数,比较准确的QPS
     rejected_connections  COUNTER  被拒绝的连接数/秒
     expired_keys  COUNTER  过期KEY的数量/秒 
     evicted_keys  COUNTER  被驱逐KEY的数量/秒
     keyspace_hits  COUNTER  命中KEY的数量/秒
     keyspace_misses  COUNTER  未命中KEY的数量/秒
     keyspace_hit_ratio  GAUGE  KEY的命中率
     keys_num  GAUGE  KEY的数量
     cmd_*  COUNTER  各种名字都执行次数/秒

    使用说明:读取配置到都数据库列表执行,配置文件格式如下(redisdb_list.txt):

     IP,Port,Password,endpoint

    192.168.1.56,7021,zhoujy,redis-56:7021
    192.168.1.55,7021,zhoujy,redis-55:7021

    最后执行:

     python redis_monitor.py redisdb_list.txt

    3) MongoDB 收集信息脚本(mongodb_monitor.py)

    ...后续添加

    4)其他相关的监控(需要装上agent),比如下面的指标:

    告警项触发条件备注
    load.1min all(#3)>10 Redis服务器过载,处理能力下降
    cpu.idle all(#3)<10 CPU idle过低,处理能力下降
    df.bytes.free.percent all(#3)<20 磁盘可用空间百分比低于20%,影响从库RDB和AOF持久化
    mem.memfree.percent all(#3)<15 内存剩余低于15%,Redis有OOM killer和使用swap的风险
    mem.swapfree.percent all(#3)<80 使用20% swap,Redis性能下降或OOM风险
    net.if.out.bytes all(#3)>94371840 网络出口流量超90MB,影响Redis响应
    net.if.in.bytes all(#3)>94371840 网络入口流量超90MB,影响Redis响应
    disk.io.util all(#3)>90 磁盘IO可能存负载,影响从库持久化和阻塞写

     

    相关文档:

    https://github.com/iambocai/falcon-monit-scripts(redis monitor)

    https://github.com/ZhuoRoger/redismon(redis monitor)

    https://www.cnblogs.com/zhoujinyi/p/6645104.html

  • 相关阅读:
    A1151 LCA in a Binary Tree (30分)
    A1150 Travelling Salesman Problem (25分)
    A1069 The Black Hole of Numbers (20分)
    A1149 Dangerous Goods Packaging (25分)
    A1148 Werewolf
    A1147 Heaps (30分)
    Ubuntu下git,gitlab团队协作
    如何查看JDK_API 2019.2.23
    linux_day1 (腾老师)2019年3月25日18:11:43(CentOs)
    jpa_缓存
  • 原文地址:https://www.cnblogs.com/jackyzm/p/9600496.html
Copyright © 2011-2022 走看看