zoukankan      html  css  js  c++  java
  • prometheus监控告警(alertmanager)发送邮件通知

    特别注意:防止发送通知过快或频繁,导致警告通知轰炸

    下载alertmanager

    下载地址:https://prometheus.io/download/
    下载解压之后直接双击exe文件启动,打开 http://localhost:9093,等 prometheus配置之后重启等会,

    修改alertmanager.yml

    global:
      resolve_timeout: 5m
      smtp_from: 'xxxxxxxx@qq.com'
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_auth_username: 'xxxxxxxxxxx@qq.com'
      smtp_auth_password: 'xxxxxxxxxxxxxxx'
      smtp_require_tls: false
      smtp_hello: 'qq.com'
    route:
      group_by: ['alertname']
      group_wait: 5s
      group_interval: 5s
      repeat_interval: 5m
      receiver: 'email'
    receivers:
    - name: 'email'
      email_configs:
      - to: 'xxxxxxxxxx@qq.com'
        send_resolved: true
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']
    

    修改prometheus.yml

    global:
      scrape_interval:     15s
      evaluation_interval: 15s
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
           - 127.0.0.1:9093
    rule_files:
        - "machine_alert_rules.yml"
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
        - targets: ['localhost:9090']
      - job_name: 'node_liux_70'
        static_configs:
        - targets: ['10.0.0.70:9100']
    

    添加machine_alert_rules.yml

    groups:
    - name: simulator-alert-rule
      rules:
      - alert: check_node_liux_70
        expr: sum(up{job="node_liux_70"}) == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          description: "已经宕机或下线超过1分钟."
    

  • 相关阅读:
    vb.net 数组参与SQL语句的查询范例
    JQUERY范例
    DOS批处理释义
    GridVIew单元格合并
    [无关] 胡言乱语3
    [数据] ZZ 数据分析这点事
    [ZZ] Big Data 开源工具
    [python] ZZ 时间相关
    python 获取时间代码
    javascript基础之我见(1)深度理解原型链
  • 原文地址:https://www.cnblogs.com/daikainan/p/14443973.html
Copyright © 2011-2022 走看看