zoukankan      html  css  js  c++  java
  • K8s系列-Prometheus使用邮件告警

    感谢作者分享-http://bjbsair.com/2020-04-07/tech-info/30650.html

    1、指定告警服务和规则文件

    告诉Promentheus,将告警信息发送给那个告警管理服务,以及使用那个告警规则文件。这里的告警服务在Kubernetes中部署,对外提供的服务名称为alertmanager,端口为9093。告警规则文件为“/etc/prometheus/rules/”目录下的所有规则文件。

    global:  
     scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  
     evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  
     # scrape_timeout is set to the global default (10s).  
      
    # 指定告警服务器  
    alerting:  
     alertmanagers:  
     - static_configs:  
     - targets:  
     - alertmanager:9093  
      
    # 指定告警规则文件  
    rule_files:  
     - "/etc/prometheus/rules/*.yml"  
     # - "second_rules.yml"  
      
    # A scrape configuration containing exactly one endpoint to scrape:  
    # Here it's Prometheus itself.  
    scrape_configs:  
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.  
     - job_name: 'prometheus'  
      
    # metrics_path defaults to '/metrics'  
     # scheme defaults to 'http'.  
      
    static_configs:  
     - targets: ['localhost:9090']  
     - job_name: 'redis'  
     static_configs:  
     - targets: ['redis-exporter-np:9121']  
     - job_name: 'node'  
     static_configs:  
     - targets: ['prometheus-prometheus-node-exporter:9100']  
     - job_name: 'windows-node-001'  
     static_configs:  
     - targets: ['10.0.32.148:9182']  
     - job_name: 'windows-node-002'  
     static_configs:  
     - targets: ['10.0.34.4:9182']  
     - job_name: 'rabbit'  
     static_configs:  
     - targets: ['prom-rabbit-prometheus-rabbitmq-exporter:9419']
    

    2、设置告警规则

    设置告警的规则,Prometheus基于此告警规则,将告警信息发送给告警服务。这将未启动的实例信息发送给告警服务,告知哪些实例没有正常启动。

    #rules  
    groups:  
     - name: node-rules  
     rules:  
     - alert: InstanceDown # 告警名称  
       expr: up == 0 # 告警判定条件  
       for: 3s # 持续多久后,才发送  
       labels: # 标签  
        team: k8s  
       annotations: # 警报信息  
        summary: "{{$labels.instance}}: has been down"  
        description: "{{$labels.instance}}: job {{$labels.job}} has been down "
    

    3、设置告警信息路由和接收器

    这里设置通过邮件接收告警信息,当告警服务接收到告警信息后,会通过邮件将告警信息发送给被告知者。

    global:  
     resolve_timeout: 5m  
     smtp_smarthost: 'smtp.163.com:25' # 发送信息邮箱的smtp服务器代理  
     smtp_from: 'xxx@163.com' # 发送信息的邮箱名称  
     smtp_auth_username: 'xxx' # 邮箱的用户名  
     smtp_auth_password: 'SYNUNQBZMIWUQXGZ' # 邮箱的密码或授权码  
      
    route:  
     group_by: ['alertname']  
     group_wait: 10s  
     group_interval: 10s  
     repeat_interval: 1h  
     receiver: 'email'  
    receivers:  
     - name: 'email'  
     email_configs:  
     - to: 'xxxxxx@aliyun.com' # 接收告警的邮箱  
     headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题  
      
    inhibit_rules:  
     - source_match:  
     severity: 'critical'  
     target_match:  
     severity: 'warning'  
     equal: ['alertname', 'dev', 'instance']
    

    4、验证

    在方案中Prometheus所监控的实例中,redis和windows-node-002没有正常启动,因此根据上述的告警规则,应该会将这些信息发送给被告警者的邮箱。

    K8s系列-Prometheus基于邮件告警

    在被告警者的邮箱中,接收的告警信息如下。

    K8s系列-Prometheus基于邮件告警感谢作者分享-http://bjbsair.com/2020-04-07/tech-info/30650.html

    1、指定告警服务和规则文件

    告诉Promentheus,将告警信息发送给那个告警管理服务,以及使用那个告警规则文件。这里的告警服务在Kubernetes中部署,对外提供的服务名称为alertmanager,端口为9093。告警规则文件为“/etc/prometheus/rules/”目录下的所有规则文件。

    global:  
     scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  
     evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  
     # scrape_timeout is set to the global default (10s).  
      
    # 指定告警服务器  
    alerting:  
     alertmanagers:  
     - static_configs:  
     - targets:  
     - alertmanager:9093  
      
    # 指定告警规则文件  
    rule_files:  
     - "/etc/prometheus/rules/*.yml"  
     # - "second_rules.yml"  
      
    # A scrape configuration containing exactly one endpoint to scrape:  
    # Here it's Prometheus itself.  
    scrape_configs:  
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.  
     - job_name: 'prometheus'  
      
    # metrics_path defaults to '/metrics'  
     # scheme defaults to 'http'.  
      
    static_configs:  
     - targets: ['localhost:9090']  
     - job_name: 'redis'  
     static_configs:  
     - targets: ['redis-exporter-np:9121']  
     - job_name: 'node'  
     static_configs:  
     - targets: ['prometheus-prometheus-node-exporter:9100']  
     - job_name: 'windows-node-001'  
     static_configs:  
     - targets: ['10.0.32.148:9182']  
     - job_name: 'windows-node-002'  
     static_configs:  
     - targets: ['10.0.34.4:9182']  
     - job_name: 'rabbit'  
     static_configs:  
     - targets: ['prom-rabbit-prometheus-rabbitmq-exporter:9419']
    

    2、设置告警规则

    设置告警的规则,Prometheus基于此告警规则,将告警信息发送给告警服务。这将未启动的实例信息发送给告警服务,告知哪些实例没有正常启动。

    #rules  
    groups:  
     - name: node-rules  
     rules:  
     - alert: InstanceDown # 告警名称  
       expr: up == 0 # 告警判定条件  
       for: 3s # 持续多久后,才发送  
       labels: # 标签  
        team: k8s  
       annotations: # 警报信息  
        summary: "{{$labels.instance}}: has been down"  
        description: "{{$labels.instance}}: job {{$labels.job}} has been down "
    

    3、设置告警信息路由和接收器

    这里设置通过邮件接收告警信息,当告警服务接收到告警信息后,会通过邮件将告警信息发送给被告知者。

    global:  
     resolve_timeout: 5m  
     smtp_smarthost: 'smtp.163.com:25' # 发送信息邮箱的smtp服务器代理  
     smtp_from: 'xxx@163.com' # 发送信息的邮箱名称  
     smtp_auth_username: 'xxx' # 邮箱的用户名  
     smtp_auth_password: 'SYNUNQBZMIWUQXGZ' # 邮箱的密码或授权码  
      
    route:  
     group_by: ['alertname']  
     group_wait: 10s  
     group_interval: 10s  
     repeat_interval: 1h  
     receiver: 'email'  
    receivers:  
     - name: 'email'  
     email_configs:  
     - to: 'xxxxxx@aliyun.com' # 接收告警的邮箱  
     headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题  
      
    inhibit_rules:  
     - source_match:  
     severity: 'critical'  
     target_match:  
     severity: 'warning'  
     equal: ['alertname', 'dev', 'instance']
    

    4、验证

    在方案中Prometheus所监控的实例中,redis和windows-node-002没有正常启动,因此根据上述的告警规则,应该会将这些信息发送给被告警者的邮箱。

    K8s系列-Prometheus基于邮件告警

    在被告警者的邮箱中,接收的告警信息如下。

    K8s系列-Prometheus基于邮件告警感谢作者分享-http://bjbsair.com/2020-04-07/tech-info/30650.html

    1、指定告警服务和规则文件

    告诉Promentheus,将告警信息发送给那个告警管理服务,以及使用那个告警规则文件。这里的告警服务在Kubernetes中部署,对外提供的服务名称为alertmanager,端口为9093。告警规则文件为“/etc/prometheus/rules/”目录下的所有规则文件。

    global:  
     scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  
     evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  
     # scrape_timeout is set to the global default (10s).  
      
    # 指定告警服务器  
    alerting:  
     alertmanagers:  
     - static_configs:  
     - targets:  
     - alertmanager:9093  
      
    # 指定告警规则文件  
    rule_files:  
     - "/etc/prometheus/rules/*.yml"  
     # - "second_rules.yml"  
      
    # A scrape configuration containing exactly one endpoint to scrape:  
    # Here it's Prometheus itself.  
    scrape_configs:  
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.  
     - job_name: 'prometheus'  
      
    # metrics_path defaults to '/metrics'  
     # scheme defaults to 'http'.  
      
    static_configs:  
     - targets: ['localhost:9090']  
     - job_name: 'redis'  
     static_configs:  
     - targets: ['redis-exporter-np:9121']  
     - job_name: 'node'  
     static_configs:  
     - targets: ['prometheus-prometheus-node-exporter:9100']  
     - job_name: 'windows-node-001'  
     static_configs:  
     - targets: ['10.0.32.148:9182']  
     - job_name: 'windows-node-002'  
     static_configs:  
     - targets: ['10.0.34.4:9182']  
     - job_name: 'rabbit'  
     static_configs:  
     - targets: ['prom-rabbit-prometheus-rabbitmq-exporter:9419']
    

    2、设置告警规则

    设置告警的规则,Prometheus基于此告警规则,将告警信息发送给告警服务。这将未启动的实例信息发送给告警服务,告知哪些实例没有正常启动。

    #rules  
    groups:  
     - name: node-rules  
     rules:  
     - alert: InstanceDown # 告警名称  
       expr: up == 0 # 告警判定条件  
       for: 3s # 持续多久后,才发送  
       labels: # 标签  
        team: k8s  
       annotations: # 警报信息  
        summary: "{{$labels.instance}}: has been down"  
        description: "{{$labels.instance}}: job {{$labels.job}} has been down "
    

    3、设置告警信息路由和接收器

    这里设置通过邮件接收告警信息,当告警服务接收到告警信息后,会通过邮件将告警信息发送给被告知者。

    global:  
     resolve_timeout: 5m  
     smtp_smarthost: 'smtp.163.com:25' # 发送信息邮箱的smtp服务器代理  
     smtp_from: 'xxx@163.com' # 发送信息的邮箱名称  
     smtp_auth_username: 'xxx' # 邮箱的用户名  
     smtp_auth_password: 'SYNUNQBZMIWUQXGZ' # 邮箱的密码或授权码  
      
    route:  
     group_by: ['alertname']  
     group_wait: 10s  
     group_interval: 10s  
     repeat_interval: 1h  
     receiver: 'email'  
    receivers:  
     - name: 'email'  
     email_configs:  
     - to: 'xxxxxx@aliyun.com' # 接收告警的邮箱  
     headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题  
      
    inhibit_rules:  
     - source_match:  
     severity: 'critical'  
     target_match:  
     severity: 'warning'  
     equal: ['alertname', 'dev', 'instance']
    

    4、验证

    在方案中Prometheus所监控的实例中,redis和windows-node-002没有正常启动,因此根据上述的告警规则,应该会将这些信息发送给被告警者的邮箱。

    K8s系列-Prometheus基于邮件告警

    在被告警者的邮箱中,接收的告警信息如下。

    K8s系列-Prometheus基于邮件告警

  • 相关阅读:
    Bit Manipulation
    218. The Skyline Problem
    Template : Two Pointers & Hash -> String process
    239. Sliding Window Maximum
    159. Longest Substring with At Most Two Distinct Characters
    3. Longest Substring Without Repeating Characters
    137. Single Number II
    142. Linked List Cycle II
    41. First Missing Positive
    260. Single Number III
  • 原文地址:https://www.cnblogs.com/lihanlin/p/12657690.html
Copyright © 2011-2022 走看看