zoukankan      html  css  js  c++  java
  • alertmanager

    alertmanager主要用于接收prometheus发送的告警信息;

    wget下载,解压,

    配置alertmanager.yml,内容如下;

    在prometheus文件下添加rules.yml内容如下:

    groups:
    - name: test-rules
    rules:
    - alert: InstanceDown
    expr: up == 0
    for: 2m
    labels:
    status: warning
    annotations:
    summary: "{{$labels.instance}}: has been down"
    description: "{{$labels.instance}}: job {{$labels.job}} has been down"
    - name: base-monitor-rule
    rules:
    - alert: NodeCpuUsage
    expr: (100 - (avg by (instance) (rate(node_cpu{job=~".*",mode="idle"}[2m])) * 100)) > 99
    for: 15m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: CPU usage is above 99% (current value is: {{ $value }}"
    - alert: NodeMemUsage
    expr: avg by (instance) ((1- (node_memory_MemFree{} + node_memory_Buffers{} + node_memory_Cached{})/node_memory_MemTotal{}) * 100) > 90
    for: 15m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: MEM usage is above 90% (current value is: {{ $value }}"
    - alert: NodeDiskUsage
    expr: (1 - node_filesystem_free{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size) * 100 > 80
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Disk usage is above 80% (current value is: {{ $value }}"
    - alert: NodeFDUsage
    expr: avg by (instance) (node_filefd_allocated{} / node_filefd_maximum{}) * 100 > 80
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: File Descriptor usage is above 80% (current value is: {{ $value }}"
    - alert: NodeLoad15
    expr: avg by (instance) (node_load15{}) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Load15 is above 100 (current value is: {{ $value }}"
    - alert: NodeAgentStatus
    expr: avg by (instance) (up{}) == 0
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Agent is down (current value is: {{ $value }}"
    - alert: NodeProcsBlocked
    expr: avg by (instance) (node_procs_blocked{}) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Blocked Procs detected!(current value is: {{ $value }}"
    - alert: NodeTransmitRate
    expr: avg by (instance) (floor(irate(node_network_transmit_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Transmit Rate is above 100MB/s (current value is: {{ $value }}"
    - alert: NodeReceiveRate
    expr: avg by (instance) (floor(irate(node_network_receive_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Receive Rate is above 100MB/s (current value is: {{ $value }}"
    - alert: NodeDiskReadRate
    expr: avg by (instance) (floor(irate(node_disk_bytes_read{}[2m]) / 1024 / 1024)) > 50
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Disk Read Rate is above 50MB/s (current value is: {{ $value }}"
    - alert: NodeDiskWriteRate
    expr: avg by (instance) (floor(irate(node_disk_bytes_written{}[2m]) / 1024 / 1024)) > 50
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Disk Write Rate is above 50MB/s (current value is: {{ $value }}"

    在prometheus文件下prometheus.yml添加

    后台启动: nohup ./alertmanager --config.file=alertmanager.yml & 

    启动prometheus服务 systemctl restart prometheus

    越努力,越幸运!!! good good study,day day up!!!
  • 相关阅读:
    SharePoint 2010“.NET研究” BI:Chart WebPart 狼人:
    .NET分布“.NET研究”式架构开发实战之一 故事起源 狼人:
    C#中的loc“.NET研究”k关键字 狼人:
    .NET 分布式架构开发“.NET研究”实战之三 数据访问深入一点的思考 狼人:
    舍WebServic“.NET研究”e 用.NET4中jQuery调用WCF 狼人:
    Visual Studio 2010构建Web浏“.NET研究”览器应用程序 狼人:
    强类型ASP.NET数据绑定改进“.NET研究”版 狼人:
    Visual Studio 2008单元测试_数据“.NET研究”库测试 狼人:
    主题执行过程改进建设中的常见奖励措施
    分类栏目Thinkphp实现无限极分类
  • 原文地址:https://www.cnblogs.com/canglongdao/p/12053653.html
Copyright © 2011-2022 走看看