zoukankan      html  css  js  c++  java
  • alertmanager

    alertmanager主要用于接收prometheus发送的告警信息;

    wget下载,解压,

    配置alertmanager.yml,内容如下;

    在prometheus文件下添加rules.yml内容如下:

    groups:
    - name: test-rules
    rules:
    - alert: InstanceDown
    expr: up == 0
    for: 2m
    labels:
    status: warning
    annotations:
    summary: "{{$labels.instance}}: has been down"
    description: "{{$labels.instance}}: job {{$labels.job}} has been down"
    - name: base-monitor-rule
    rules:
    - alert: NodeCpuUsage
    expr: (100 - (avg by (instance) (rate(node_cpu{job=~".*",mode="idle"}[2m])) * 100)) > 99
    for: 15m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: CPU usage is above 99% (current value is: {{ $value }}"
    - alert: NodeMemUsage
    expr: avg by (instance) ((1- (node_memory_MemFree{} + node_memory_Buffers{} + node_memory_Cached{})/node_memory_MemTotal{}) * 100) > 90
    for: 15m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: MEM usage is above 90% (current value is: {{ $value }}"
    - alert: NodeDiskUsage
    expr: (1 - node_filesystem_free{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size) * 100 > 80
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Disk usage is above 80% (current value is: {{ $value }}"
    - alert: NodeFDUsage
    expr: avg by (instance) (node_filefd_allocated{} / node_filefd_maximum{}) * 100 > 80
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: File Descriptor usage is above 80% (current value is: {{ $value }}"
    - alert: NodeLoad15
    expr: avg by (instance) (node_load15{}) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Load15 is above 100 (current value is: {{ $value }}"
    - alert: NodeAgentStatus
    expr: avg by (instance) (up{}) == 0
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Agent is down (current value is: {{ $value }}"
    - alert: NodeProcsBlocked
    expr: avg by (instance) (node_procs_blocked{}) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Blocked Procs detected!(current value is: {{ $value }}"
    - alert: NodeTransmitRate
    expr: avg by (instance) (floor(irate(node_network_transmit_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Transmit Rate is above 100MB/s (current value is: {{ $value }}"
    - alert: NodeReceiveRate
    expr: avg by (instance) (floor(irate(node_network_receive_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Receive Rate is above 100MB/s (current value is: {{ $value }}"
    - alert: NodeDiskReadRate
    expr: avg by (instance) (floor(irate(node_disk_bytes_read{}[2m]) / 1024 / 1024)) > 50
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Disk Read Rate is above 50MB/s (current value is: {{ $value }}"
    - alert: NodeDiskWriteRate
    expr: avg by (instance) (floor(irate(node_disk_bytes_written{}[2m]) / 1024 / 1024)) > 50
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Disk Write Rate is above 50MB/s (current value is: {{ $value }}"

    在prometheus文件下prometheus.yml添加

    后台启动: nohup ./alertmanager --config.file=alertmanager.yml & 

    启动prometheus服务 systemctl restart prometheus

    越努力,越幸运!!! good good study,day day up!!!
  • 相关阅读:
    软件项目版本号的命名规则及格式
    你必须知道的C#的25个基础概念
    Visual C#常用函数和方法集汇总
    web标准下的web开发流程思考
    设计模式(5)>模板方法 小强斋
    设计模式(9)>迭代器模式 小强斋
    设计模式(10)>策略模式 小强斋
    设计模式(8)>代理模式 小强斋
    设计模式(7)>观察者模式 小强斋
    设计模式(7)>观察者模式 小强斋
  • 原文地址:https://www.cnblogs.com/canglongdao/p/12053653.html
Copyright © 2011-2022 走看看