zoukankan      html  css  js  c++  java
  • Prometheus 监控之 zookeeper

    Git 项目地址:https://github.com/jiankunking/zookeeper_exporter
    exporter 下载地址:https://github.com/carlpett/zookeeper_exporter/releases/download/v1.0.2/zookeeper_exporter
    注意:export 适合 zookeeper3.4+
    ①下载 zookeeper_export
    wget https://github.com/carlpett/zookeeper_exporter/releases/download/v1.0.2/zookeeper_exporter
    ②启动 zookeeper_export
    nohup /usr/local/bin/zookeeper_exporter >>/dev/null 2>&1 &
    ③查看是否正常
    1.jpg


    ④将 export 加入到 prometheus 服务端。
    2.jpg


    ⑤登陆 grafana,导入模板;搜索 Zookeeper Exporer Overview 或者 拷贝 pid 9236
    3.jpg


    zookeeper alter 监控参考如下:

    groups:
    - name: zookeeperStatsAlert
      rules:
      - alert: 堆积请求数过大
        expr: avg(zk_outstanding_requests) by (instance) > 10    for: 1m
        labels:      severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} "
          description: "积请求数过大"
      - alert: 阻塞中的 sync 过多
        expr: avg(zk_pending_syncs) by (instance) > 10
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} "
          description: "塞中的 sync 过多"
      - alert: 平均响应延迟过高
        expr: avg(zk_avg_latency) by (instance) > 10
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} "
          description: '平均响应延迟过高'
      - alert: 打开文件描述符数大于系统设定的大小
        expr: zk_open_file_descriptor_count > zk_max_file_descriptor_count * 0.85
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} "
          description: '打开文件描述符数大于系统设定的大小'
      - alert: zookeeper服务器宕机
        expr: zk_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} "
          description: 'zookeeper服务器宕机'
      - alert: zk主节点丢失
        expr: absent(zk_server_state{state="leader"})  != 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} "
          description: 'zk主节点丢失'
    
     
    需要指定阈值的指标

    zk_outstanding_requests 堆积请求数
    zk_pending_syncs 阻塞中的 sync 操作
    zk_avg_latency 平均 响应延迟
    zk_open_file_descriptor_count 打开 文件描述符 数
    zk_max_file_descriptor_count 最大 文件描述符 数
    zk_up 1
    zk_server_state 主从状态
    zk_num_alive_connections 活跃连接数


    source:https://hacpai.com/article/1575868724409

  • 相关阅读:
    BZOJ1187 [HNOI2007]神奇游乐园(插头dp)
    BZOJ4926 皮皮妖的递推
    BZOJ3684 大朋友和多叉树(多项式相关计算)
    BZOJ4574 [Zjoi2016]线段树
    杜教筛进阶+洲阁筛讲解+SPOJ divcnt3
    从几场模拟考试看一类分块算法
    bzoj3142 luogu3228 HNOI2013 数列
    luogu3244 bzoj4011 HNOI2015 落忆枫音
    codeforces 286E Ladies' Shop
    BZOJ4825 单旋
  • 原文地址:https://www.cnblogs.com/weifeng1463/p/12880880.html
Copyright © 2011-2022 走看看