zoukankan      html  css  js  c++  java
  • zookeeper 的监控指标

    一 应用场景描述

    在目前公司的业务中,没有太多使用ZooKeeper作为协同服务的场景。但是我们将使用Codis作为Redis的集群部署方案,Codis依赖ZooKeeper来存储配置信息。所以做好ZooKeeper的监控也很重要。

    二 ZooKeeper监控要点

    系统监控

    内存使用量    ZooKeeper应当完全运行在内存中,不能使用到SWAP。Java Heap大小不能超过可用内存。

    Swap使用量    使用Swap会降低ZooKeeper的性能,设置vm.swappiness = 0

    网络带宽占用   如果发现ZooKeeper性能降低关注下网络带宽占用情况和丢包情况,通常情况下ZooKeeper是20%写入80%读入

    磁盘使用量    ZooKeeper数据目录使用情况需要注意

    磁盘I/O      ZooKeeper的磁盘写入是异步的,所以不会存在很大的I/O请求,如果ZooKeeper和其他I/O密集型服务公用应该关注下磁盘I/O情况

    ZooKeeper监控

    zk_avg/min/max_latency    响应一个客户端请求的时间,建议这个时间大于10个Tick就报警

    zk_outstanding_requests        排队请求的数量,当ZooKeeper超过了它的处理能力时,这个值会增大,建议设置报警阀值为10

    zk_packets_received      接收到客户端请求的包数量

    zk_packets_sent        发送给客户单的包数量,主要是响应和通知

    zk_max_file_descriptor_count   最大允许打开的文件数,由ulimit控制

    zk_open_file_descriptor_count    打开文件数量,当这个值大于允许值得85%时报警

    Mode                运行的角色,如果没有加入集群就是standalone,加入集群式follower或者leader

    zk_followers          leader角色才会有这个输出,集合中follower的个数。正常的值应该是集合成员的数量减1

    zk_pending_syncs       leader角色才会有这个输出,pending syncs的数量

    zk_znode_count         znodes的数量

    zk_watch_count         watches的数量

    Java Heap Size         ZooKeeper Java进程的

    # echo ruok|nc 127.0.0.1 2181
    imok
    
    
    # echo mntr|nc 127.0.0.1 2181
    zk_version	3.4.6-1569965, built on 02/20/2014 09:09 GMT
    zk_avg_latency	0
    zk_max_latency	0
    zk_min_latency	0
    zk_packets_received	11
    zk_packets_sent	10
    zk_num_alive_connections	1
    zk_outstanding_requests	0
    zk_server_state	leader
    zk_znode_count	17159
    zk_watch_count	0
    zk_ephemerals_count	1
    zk_approximate_data_size	6666471
    zk_open_file_descriptor_count	29
    zk_max_file_descriptor_count	102400
    zk_followers	2
    zk_synced_followers	2
    zk_pending_syncs	0
    
    
    # echo srvr|nc 127.0.0.1 2181
    Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
    Latency min/avg/max: 0/0/0
    Received: 26
    Sent: 25
    Connections: 1
    Outstanding: 0
    Zxid: 0x500000000
    Mode: leader
    Node count: 17159

    三 编写Zabbix监控ZooKeeper的脚本和配置文件

    要让Zabbix收集到这些监控数据,有两种方法一种是每个监控项目通过zabbix agent单独获取,主动监控和被动监控都可以。还有一种方法就是将这些监控数据一次性使用zabbix_sender全部发送给zabbix。这里我们选择第二种方式。那么采用zabbix_sender一次性发送全部监控数据的脚本就不能像通过zabbix agent这样逐个获取监控项目来编写脚本。

    首先想办法将监控项目汇集成一个字典,然后遍历这个字典,将字典中的key:value对通过zabbix_sender的-k和-o参数指定发送出去

    echo mntr|nc 127.0.0.1 2181

    这条命令可以使用Python的subprocess模块调用,也可以使用socket模块去访问2181端口然后发送命令获取数据,获取到mntr执行的数据后还需要将其转化成为字典数据

    即需要将这种样式的数据

    zk_version	3.4.6-1569965, built on 02/20/2014 09:09 GMT
    zk_avg_latency	0
    zk_max_latency	0
    zk_min_latency	0
    zk_packets_received	91
    zk_packets_sent	90
    zk_num_alive_connections	1
    zk_outstanding_requests	0
    zk_server_state	follower
    zk_znode_count	17159
    zk_watch_count	0
    zk_ephemerals_count	1
    zk_approximate_data_size	6666471
    zk_open_file_descriptor_count	27
    zk_max_file_descriptor_count	102400

    转换成为这样的数据

    {'zk_followers': 2, 'zk_outstanding_requests': 0, 'zk_approximate_data_size': 6666471, 'zk_packets_sent': 2089, 'zk_pending_syncs': 0, 'zk_avg_latency': 0, 'zk_version': '3.4.6-1569965, built on 02/20/2014 09:09 GMT', 'zk_watch_count': 2, 'zk_packets_received': 2090, 'zk_open_file_descriptor_count': 30, 'zk_server_ruok': 'imok', 'zk_server_state': 'leader', 'zk_synced_followers': 2, 'zk_max_latency': 28, 'zk_num_alive_connections': 2, 'zk_min_latency': 0, 'zk_ephemerals_count': 1, 'zk_znode_count': 17159, 'zk_max_file_descriptor_count': 102400}

    到最后需要使用zabbix_sender发送的数据格式这个样子的

    zookeeper.status[zk_version]这是key的名称

    zookeeper.status[zk_outstanding_requests]:0
    zookeeper.status[zk_approximate_data_size]:6666471
    zookeeper.status[zk_packets_sent]:48
    zookeeper.status[zk_avg_latency]:0
    zookeeper.status[zk_version]:3.4.6-1569965, built on 02/20/2014 09:09 GMT
    zookeeper.status[zk_watch_count]:0
    zookeeper.status[zk_packets_received]:49
    zookeeper.status[zk_open_file_descriptor_count]:27
    zookeeper.status[zk_server_ruok]:imok
    zookeeper.status[zk_server_state]:follower
    zookeeper.status[zk_max_latency]:0
    zookeeper.status[zk_num_alive_connections]:1
    zookeeper.status[zk_min_latency]:0
    zookeeper.status[zk_ephemerals_count]:1
    zookeeper.status[zk_znode_count]:17159
    zookeeper.status[zk_max_file_descriptor_count]:102400

    精简代码如下:

    #!/usr/bin/python
    import socket
    #from StringIO import StringIO
    from cStringIO import StringIO
    s=socket.socket()
    s.connect(('localhost',2181))
    s.send('mntr')
    data_mntr=s.recv(2048)
    s.close()
    #print data_mntr
    h=StringIO(data_mntr)
    result={}
    zresult={}
    for line in  h.readlines():
        key,value=map(str.strip,line.split('	'))
        zkey='zookeeper.status' + '[' + key + ']'
        zvalue=value
        result[key]=value
        zresult[zkey]=zvalue
    print result
    print '
    
    '
    print zresult
    # python test.py 
    {'zk_outstanding_requests': '0', 'zk_approximate_data_size': '6666471', 'zk_max_latency': '0', 'zk_avg_latency': '0', 'zk_version': '3.4.6-1569965, built on 02/20/2014 09:09 GMT', 'zk_watch_count': '0', 'zk_num_alive_connections': '1', 'zk_open_file_descriptor_count': '27', 'zk_server_state': 'follower', 'zk_packets_sent': '542', 'zk_packets_received': '543', 'zk_min_latency': '0', 'zk_ephemerals_count': '1', 'zk_znode_count': '17159', 'zk_max_file_descriptor_count': '102400'}
    
    
    {'zookeeper.status[zk_watch_count]': '0', 'zookeeper.status[zk_avg_latency]': '0', 'zookeeper.status[zk_max_latency]': '0', 'zookeeper.status[zk_approximate_data_size]': '6666471', 'zookeeper.status[zk_server_state]': 'follower', 'zookeeper.status[zk_num_alive_connections]': '1', 'zookeeper.status[zk_min_latency]': '0', 'zookeeper.status[zk_outstanding_requests]': '0', 'zookeeper.status[zk_packets_received]': '543', 'zookeeper.status[zk_ephemerals_count]': '1', 'zookeeper.status[zk_znode_count]': '17159', 'zookeeper.status[zk_packets_sent]': '542', 'zookeeper.status[zk_open_file_descriptor_count]': '27', 'zookeeper.status[zk_max_file_descriptor_count]': '102400', 'zookeeper.status[zk_version]': '3.4.6-1569965, built on 02/20/2014 09:09 GMT'}

    详细代码如下:

    #!/usr/bin/python
    
    
    """ Check Zookeeper Cluster
    
    zookeeper version should be newer than 3.4.x
    
    # echo mntr|nc 127.0.0.1 2181
    zk_version	3.4.6-1569965, built on 02/20/2014 09:09 GMT
    zk_avg_latency	0
    zk_max_latency	4
    zk_min_latency	0
    zk_packets_received	84467
    zk_packets_sent	84466
    zk_num_alive_connections	3
    zk_outstanding_requests	0
    zk_server_state	follower
    zk_znode_count	17159
    zk_watch_count	2
    zk_ephemerals_count	1
    zk_approximate_data_size	6666471
    zk_open_file_descriptor_count	29
    zk_max_file_descriptor_count	102400
    
    # echo ruok|nc 127.0.0.1 2181
    imok
    
    """
    
    import sys
    import socket
    import re
    import subprocess
    from StringIO import StringIO
    import os
    
    
    zabbix_sender = '/opt/app/zabbix/sbin/zabbix_sender'
    zabbix_conf = '/opt/app/zabbix/conf/zabbix_agentd.conf'
    send_to_zabbix = 1
    
    
    
    ############# get zookeeper server status
    class ZooKeeperServer(object):
    
        def __init__(self, host='localhost', port='2181', timeout=1):
            self._address = (host, int(port))
            self._timeout = timeout
            self._result  = {}
    
        def _create_socket(self):
            return socket.socket()
    
    
        def _send_cmd(self, cmd):
            """ Send a 4letter word command to the server """
            s = self._create_socket()
            s.settimeout(self._timeout)
    
            s.connect(self._address)
            s.send(cmd)
    
            data = s.recv(2048)
            s.close()
    
            return data
    
        def get_stats(self):
            """ Get ZooKeeper server stats as a map """
            data_mntr = self._send_cmd('mntr')
            data_ruok = self._send_cmd('ruok')
            if data_mntr:
                result_mntr = self._parse(data_mntr)
            if data_ruok:
                result_ruok = self._parse_ruok(data_ruok)
    
            self._result = dict(result_mntr.items() + result_ruok.items())
            
            if not self._result.has_key('zk_followers') and not self._result.has_key('zk_synced_followers') and not self._result.has_key('zk_pending_syncs'):
    
               ##### the tree metrics only exposed on leader role zookeeper server, we just set the followers' to 0
               leader_only = {'zk_followers':0,'zk_synced_followers':0,'zk_pending_syncs':0}    
               self._result = dict(result_mntr.items() + result_ruok.items() + leader_only.items() )
    
            return self._result  
    
    
    
        def _parse(self, data):
            """ Parse the output from the 'mntr' 4letter word command """
            h = StringIO(data)
            
            result = {}
            for line in h.readlines():
                try:
                    key, value = self._parse_line(line)
                    result[key] = value
                except ValueError:
                    pass # ignore broken lines
    
            return result
    
        def _parse_ruok(self, data):
            """ Parse the output from the 'ruok' 4letter word command """
           
            h = StringIO(data)
           
            result = {}
           
            ruok = h.readline()
            if ruok:
               result['zk_server_ruok'] = ruok
     
            return result
     
    
    
        def _parse_line(self, line):
            try:
                key, value = map(str.strip, line.split('	'))
            except ValueError:
                raise ValueError('Found invalid line: %s' % line)
    
            if not key:
                raise ValueError('The key is mandatory and should not be empty')
    
            try:
                value = int(value)
            except (TypeError, ValueError):
                pass
    
            return key, value
    
    
    
        def get_pid(self):
    #  ps -ef|grep java|grep zookeeper|awk '{print $2}'
             pidarg = '''ps -ef|grep java|grep zookeeper|grep -v grep|awk '{print $2}' ''' 
             pidout = subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE)
             pid = pidout.stdout.readline().strip('
    ')
             return pid
    
    
        def send_to_zabbix(self, metric):
             key = "zookeeper.status[" +  metric + "]"
    
             if send_to_zabbix > 0:
                 #print key + ":" + str(self._result[metric])
                 try:
    
                    subprocess.call([zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", str(self._result[metric]) ], stdout=FNULL, stderr=FNULL, shell=False)
                 except OSError, detail:
                    print "Something went wrong while exectuting zabbix_sender : ", detail
             else:
                    print "Simulation: the following command would be execucted :
    ", zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", self._result[metric], "
    "
    
    
    
    
    def usage():
            """Display program usage"""
    
            print "
    Usage : ", sys.argv[0], " alive|all"
            print "Modes : 
    	alive : Return pid of running zookeeper
    	all : Send zookeeper stats as well"
            sys.exit(1)
    
    
    
    accepted_modes = ['alive', 'all']
    
    if len(sys.argv) == 2 and sys.argv[1] in accepted_modes:
            mode = sys.argv[1]
    else:
            usage()
    
    
    
    
    zk = ZooKeeperServer()
    #  print zk.get_stats()
    pid = zk.get_pid()
    
    if pid != "" and  mode == 'all':
       zk.get_stats()
       # print zk._result
       FNULL = open(os.devnull, 'w')
       for key in zk._result:
           zk.send_to_zabbix(key)
       FNULL.close()
       print pid
    
    elif pid != "" and mode == "alive":
        print pid
    else:
        print 0

    zabbix配置文件check_zookeeper.conf

    UserParameter=zookeeper.status[*],/usr/bin/python /opt/app/zabbix/sbin/check_zookeeper.py $1

    重新启动zabbix agent服务

    文章转自:

    http://www.cnblogs.com/405845829qq/p/6494478.html

    http://blog.csdn.net/hackerwin7/article/details/43985049

  • 相关阅读:
    ThingJS之聚光灯,js开发+控制面板轻松搭
    ThingJS不到50行代码就轻松开发拾取功能
    ThingJS:3D交互技术简化第一人称行走模式
    ThingJS:3D地图开发组件更新啦!
    ThingJS教你怎么用拖拽的方式变身热力图
    天空盒结合ThingJS开发平台,会碰撞出什么火花呢
    ThingJS中支持引用css样式并使用js语法开发
    ThingJS基于CMAP组件融合高德地图web API
    ThingJS:让可视化技术成为高效城市管理的好帮手
    SpringBoot整合Filter
  • 原文地址:https://www.cnblogs.com/smail-bao/p/7201091.html
Copyright © 2011-2022 走看看