zoukankan      html  css  js  c++  java
  • [k8s]prometheus+alertmanager二进制安装实现简单邮件告警

    • 本次任务是用alertmanaer发一个报警邮件
    • 本次环境采用二进制普罗组件
    • 本次准备监控一个节点的内存,当使用率大于2%时候(测试),发邮件报警.

    k8s集群使用普罗官方文档

    环境准备

    下载二进制https://prometheus.io/download/

    https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.windows-amd64.tar.gz
    https://github.com/prometheus/alertmanager/releases/download/v0.12.0/alertmanager-0.12.0.windows-amd64.tar.gz
    https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz
    

    解压

    /root/
    ├── alertmanager -> alertmanager-0.12.0.linux-amd64
    ├── alertmanager-0.12.0.linux-amd64
    ├── alertmanager-0.12.0.linux-amd64.tar.gz
    ├── node_exporter-0.15.2.linux-amd64
    ├── node_exporter-0.15.2.linux-amd64.tar.gz
    ├── prometheus -> prometheus-2.0.0.linux-amd64
    ├── prometheus-2.0.0.linux-amd64
    └── prometheus-2.0.0.linux-amd64.tar.gz
    

    实验架构

    配置alertmanager

    创建 alert.yml

    [root@n1 alertmanager]# ls
    alertmanager  alert.yml  amtool  data  LICENSE  NOTICE  simple.yml
    
    

    alert.yml 里面定义下: 谁发送 什么事件 发给谁 怎么发等.

    cat alert.yml 
    global:
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: 'maotai@163.com'
      smtp_auth_username: 'maotai@163.com'
      smtp_auth_password: '123456'
    
    
    templates:
      - '/root/alertmanager/template/*.tmpl'
    
    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 10m
      receiver: default-receiver
    
    
    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: 'maotai@foxmail.com'
      
      
    - 配置好后启动即可
    ./alertmanager -config.file=./alert.yml
    

    配置prometheus

    报警规则rule.yml配置(将被prometheus.yml调用)

    当使用率大于2%时候(测试),发邮件报警

    $ cat rule.yml 
    groups:
    - name: test-rule
      rules:
      - alert: NodeMemoryUsage
        expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2
        for: 1m
        labels:
          severity: warning 
        annotations:
          summary: "{{$labels.instance}}: High Memory usage detected"
          description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
    

    关键在于这个公式

    (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2
    

    labels 给这个规则打个标签

    annotations(报警说明)这部分是报警内容

    监控k从哪里获取?(后面有说) node_memory_MemTotal/node_memory_Buffers/node_memory_Cached

    prometheus.yml配置

    • 添加node_expolore这个job

    • 添加rule_files的报警规则,rule_files部分调用rule.yml

    $ cat prometheus.yml 
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
    
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["localhost:9093"]
    
    rule_files:
      - /root/prometheus/rule.yml
    
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['192.168.14.11:9090']
      - job_name: linux
        static_configs:
          - targets: ['192.168.14.11:9100']
            labels:
              instance: db1
    
    

    配置好后启动普罗然后访问,可以看到了node target了.

    查看node_explore抛出的metric

    查看alert,可以看到告警规则发生的状态

    这些公式的key从这里可以看到(前提是当你安装了对应的explore),按照这个k来写告警公式

    查看收到的邮件


    微信报警配置

    global:
      # The smarthost and SMTP sender used for mail notifications.
      resolve_timeout: 6m
      smtp_smarthost: '172.16.100.14:25'
      smtp_from: 'svnbuild_yf@iflytek.com'
      smtp_auth_username: 'svnbuild_yf'
      smtp_auth_password: 'tag#write@2015313'
      smtp_require_tls: false
    
      # The auth token for Hipchat.
      hipchat_auth_token: '1234556789'
      # Alternative host for Hipchat. 
      hipchat_api_url: 'https://hipchat.foobar.org/'
      wechat_api_url: "https://qyapi.weixin.qq.com/cgi-bin/"
      wechat_api_secret: "4tQroVeB0xUcccccccc65Yfkj2Nkt90a80MH3ayI"
      wechat_api_corp_id: "wxaf5acxxxx5f8eb98"
      
    
    # The directory from which notification templates are read.
    templates:
    - 'templates/*.tmpl'
    
    # The root route on which each incoming alert enters.
    route:
      # The labels by which incoming alerts are grouped together. For example,
      # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
      # be batched into a single group.
      group_by: ['alertname']
    
      # When a new group of alerts is created by an incoming alert, wait at
      # least 'group_wait' to send the initial notification.
      # This way ensures that you get multiple alerts for the same group that start
      # firing shortly after another are batched together on the first 
      # notification.
      group_wait: 3s
    
      # When the first notification was sent, wait 'group_interval' to send a batch
      # of new alerts that started firing for that group.
      group_interval: 5m
    
      # If an alert has successfully been sent, wait 'repeat_interval' to
      # resend them.
      repeat_interval: 1h
    
      # A default receiver
      receiver: ybyang2
    
    
      routes:
      - match:
          job: "11"
          #service: "node_exporter"
        routes:
        - match:
            status: yellow
          receiver: ybyang2
        - match:
            status: orange
          receiver: berlin
    
    
    # Inhibition rules allow to mute a set of alerts given that another alert is
    # firing.
    # We use this to mute any warning-level notifications if the same alert is 
    # already critical.
    inhibit_rules:
    - source_match:
        service: 'up'
      target_match:
        service: 'mysql'
      # Apply inhibition if the alerqtname is the same.
      equal: ["instance"]
    
    - source_match:
        service: "mysql"
      target_match:
        service: "mysql-query"
      equal: ['instance']
    
    - source_match:
        service: "A"
      target_match:
        service: "B"
      equal: ["instance"]
    
    - source_match:
        service: "B"
      target_match:
        service: "C"
      equal: ["instance"]
    
    receivers:
    - name: 'ybyang2'
      email_configs:
      - to: 'ybyang2@iflytek.com'
        send_resolved: true
        html: '{{ template "email.default.html" . }}'
        headers: { Subject: "[mail] 测试技术部监控告警邮件" }
        
    - name: "berlin"
      wechat_configs:
      - send_resolved: true
        to_user: "@all"
        to_party: ""
        to_tag: ""
        agent_id: "1"
        corp_id: "wxaf5a99ccccc5f8eb98"
    
    
    
    
  • 相关阅读:
    无法为数据库 'tempdb' 中的对象分配空间,因为 'PRIMARY' 文件组已满
    数据库通用分页存储过程
    ef linq 中判断实体中是否包含某集合
    linq 动态判断
    bootstrap切换按钮点击后显示的颜色
    abp vue vscode 配置
    abp ef codefirst Value cannot be null. Parameter name: connectionString
    git diff 分支1 分支2 --stat命令没有将所有的不同显示出来
    区块链相关介绍
    需求分析工作流程
  • 原文地址:https://www.cnblogs.com/iiiiher/p/8277040.html
Copyright © 2011-2022 走看看