zoukankan      html  css  js  c++  java
  • Prometheus 企业微信报警/inhibit抑制 /静默(二)

    创建企业微信应用

    注册企业微信:访问https://work.weixin.qq.com/,注册企业,随便填,不需要认证
    创建应用



    创建告警配置

    vim /usr/local/prometheus-2.1/rule2.yml
    groups:
    - name: cluster
      rules:
      - alert: HIGHCPU
        expr: (1-irate(node_cpu_seconds_total{mode="idle",job="export_test2"}[1m]))*100 > 10
        for: 5s
        labels:
          for: 'highcpu'
        annotations:
          description: CPU MORE THAN 10%
          summary: 'cpu more than 10%'
    

    在Prometheus的配置中添加以上规则

    vim /usr/local/prometheus-2.1/prometheus.yml 
    rule_files:
      - "/usr/local/prometheus-2.1/rule.yml"
      - "/usr/local/prometheus-2.1/rule2.yml"   #添加此规则
    

    创建报警策略

     vim /usr/local/alertmanager-0.15.2/alertmanager.yml
    global:
      wechat_api_corp_id: 'ww0cxxxxf5b5'
      wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
      wechat_api_secret: 'K4jHxxxxxxxL4_4Xj-lvQ'
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 5s
      repeat_interval: 10s
      receiver: 'weixin'
      routes:
      - receiver: 'weixin'
        match:
          severity: 'critical'
      - receiver: 'weixin'
        match:
          for: 'highcpu'
    receivers:
    - name: 'weixin'
      wechat_configs:
      - send_resolved: true #告警恢复发送通知
        to_party: '1'
        agent_id: '1000003'
        corp_id: 'ww0cxxxxf5b5'
        api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
        api_secret: 'K4jHxxxxxxxL4_4Xj-lvQ'
    

    corp_id :在企业微信中我的企业 --> 企业信息 --> 企业ID
    agent_id 与 api_secret :点击创建的应用Prometheus,可以看到AgentId 与 Secret
    to_party:是指发送信息的部门ID
    api_url : 企业微信地址

    重启prometheus 与 alertmanager 服务

    测试

    在被监控机上拉高cpu

    cat /dev/urandom | md5sum
    

    企业微信收到告警信息

    [FIRING:1] HIGHCPU (0 highcpu node2 export_test2 idle)
    CPU MORE THAN 10% cpu more than 10%
    Alerts Firing:
    Labels:
     - alertname = HIGHCPU
     - cpu = 0
     - for = highcpu
     - instance = node2
     - job = export_test2
     - mode = idle
    Annotations:
     - description = CPU MORE THAN 10%
     - summary = cpu more than 10%
    Source: http://centos1.com:9090/graph?g0.expr=%281+-+irate%28node_cpu_seconds_total%7Bjob%3D%22export_test2%22%2Cmode%3D%22idle%22%7D%5B1m%5D%29%29+%2A+100+%3E+10&g0.tab=1
    
    AlertmanagerUrl:
    http://centos1.com:9093/#/alerts?receiver=weixin
    

    cpu恢复,收到通知信息

    [RESOLVED] HIGHCPU (0 highcpu node2 export_test2 idle)
    CPU MORE THAN 10% cpu more than 10%
    
    Alerts Resolved:
    Labels:
     - alertname = HIGHCPU
     - cpu = 0
     - for = highcpu
     - instance = node2
     - job = export_test2
     - mode = idle
    Annotations:
     - description = CPU MORE THAN 10%
     - summary = cpu more than 10%
    Source: http://centos1.com:9090/graph?g0.expr=%281+-+irate%28node_cpu_seconds_total%7Bjob%3D%22export_test2%22%2Cmode%3D%22idle%22%7D%5B1m%5D%29%29+%2A+100+%3E+10&g0.tab=1
    
    AlertmanagerUrl:
    http://centos1.com:9093/#/alerts?receiver=weixin
    

    注:如果你的企业微信收不到告警信息,并坚信配置没有问题,那么可以重新注册一个企业微信试试。

    抑制规则试用

    注:本文中配置抑制的两个监控项没有直接逻辑联系,纯属测试抑制功能

    添加新的告警配置

    vim /usr/local/prometheus-2.1/rule2.yml  在尾部添加以下配置
    - name: test
      rules:
      - alert: go_goroutines
        expr: go_goroutines{instance="node2",job="export_test2"} > 5
        for: 10s
        labels:
          severity: 'warning'
        annotations:
          description: go_goroutines > 5
    

    添加以上规则的通知方式与抑制配置

    vim /usr/local/alertmanager-0.15.2/alertmanager.yml
    global:
      wechat_api_corp_id: 'ww0xxxxxxx5b5'
      wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
      wechat_api_secret: 'K4jH8xxxxxxxxxxxXj-lvQ'
    
    #templates:
    #  - '/alertmanager/template/wechat.tmpl'
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 5s
      repeat_interval: 10s
      receiver: 'weixin'
      routes:
      - receiver: 'weixin'
        match:
          severity: 'critical'
      - receiver: 'weixin'
        match:
          for: 'highcpu'
      - receiver: 'weixin'              #新添加通知方式(三行)
        match:
          severity: 'warning'
    receivers:
    - name: 'weixin'
      wechat_configs:
      - send_resolved: true
        to_party: '1'
        agent_id: '1000003'
        corp_id: 'ww0xxxxxxxxx5b5'
        api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
        api_secret: 'K4jH8xxxxxxxxxxxxxxxxxxxXj-lvQ'
    
    inhibit_rules:                   #新添加抑制规则
      - source_match:
          for: 'highcpu'
        target_match:
          severity: 'warning'
        equal: ['instance','job']
    

    实现效果为:当cpu与go_goroutines都满足告警条件,cpu发出告警,go_goroutines被抑制
    当已经发送的告警通知匹配到target_match和target_match_re规则,当有新的告警规则如果满足source_match或者定义的匹配规则,并且以发送的告警与新产生的告警中equal定义的标签完全相同,则启动抑制机制,新的告警不会发送。

    静默规则试用

    在alertmanager的web界面创建临时静默规则,将label为 for:highcpu 的告警静默,web界面 silences --> New Silence

    注:以下start与end使用的是UTC时间,比北京时间晚8h


    添加静默规则后在指定时间段内不再收到label为 for:highcpu 的告警信息。

  • 相关阅读:
    前端性能优化
    技术从业者的未来(二)
    微服务架构
    SpringCloud 微服务最佳开发实践
    架构师之路
    SpringBoot开发秘籍
    架构设计方法论
    消息架构的设计难题以及应对之道
    SpringCloud 中如何防止绕过网关请求后端服务?
    微服务架构授权是在网关做还是在微服务做?
  • 原文地址:https://www.cnblogs.com/huandada/p/10371169.html
Copyright © 2011-2022 走看看