zoukankan      html  css  js  c++  java
  • alertmanager

    alertmanager主要用于接收prometheus发送的告警信息;

    wget下载,解压,

    配置alertmanager.yml,内容如下;

    在prometheus文件下添加rules.yml内容如下:

    groups:
    - name: test-rules
    rules:
    - alert: InstanceDown
    expr: up == 0
    for: 2m
    labels:
    status: warning
    annotations:
    summary: "{{$labels.instance}}: has been down"
    description: "{{$labels.instance}}: job {{$labels.job}} has been down"
    - name: base-monitor-rule
    rules:
    - alert: NodeCpuUsage
    expr: (100 - (avg by (instance) (rate(node_cpu{job=~".*",mode="idle"}[2m])) * 100)) > 99
    for: 15m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: CPU usage is above 99% (current value is: {{ $value }}"
    - alert: NodeMemUsage
    expr: avg by (instance) ((1- (node_memory_MemFree{} + node_memory_Buffers{} + node_memory_Cached{})/node_memory_MemTotal{}) * 100) > 90
    for: 15m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: MEM usage is above 90% (current value is: {{ $value }}"
    - alert: NodeDiskUsage
    expr: (1 - node_filesystem_free{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size) * 100 > 80
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Disk usage is above 80% (current value is: {{ $value }}"
    - alert: NodeFDUsage
    expr: avg by (instance) (node_filefd_allocated{} / node_filefd_maximum{}) * 100 > 80
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: File Descriptor usage is above 80% (current value is: {{ $value }}"
    - alert: NodeLoad15
    expr: avg by (instance) (node_load15{}) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Load15 is above 100 (current value is: {{ $value }}"
    - alert: NodeAgentStatus
    expr: avg by (instance) (up{}) == 0
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Agent is down (current value is: {{ $value }}"
    - alert: NodeProcsBlocked
    expr: avg by (instance) (node_procs_blocked{}) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Blocked Procs detected!(current value is: {{ $value }}"
    - alert: NodeTransmitRate
    expr: avg by (instance) (floor(irate(node_network_transmit_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Transmit Rate is above 100MB/s (current value is: {{ $value }}"
    - alert: NodeReceiveRate
    expr: avg by (instance) (floor(irate(node_network_receive_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Receive Rate is above 100MB/s (current value is: {{ $value }}"
    - alert: NodeDiskReadRate
    expr: avg by (instance) (floor(irate(node_disk_bytes_read{}[2m]) / 1024 / 1024)) > 50
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Disk Read Rate is above 50MB/s (current value is: {{ $value }}"
    - alert: NodeDiskWriteRate
    expr: avg by (instance) (floor(irate(node_disk_bytes_written{}[2m]) / 1024 / 1024)) > 50
    for: 2m
    labels:
    service_name: test
    level: warning
    annotations:
    description: "{{$labels.instance}}: Node Disk Write Rate is above 50MB/s (current value is: {{ $value }}"

    在prometheus文件下prometheus.yml添加

    后台启动: nohup ./alertmanager --config.file=alertmanager.yml & 

    启动prometheus服务 systemctl restart prometheus

    越努力,越幸运!!! good good study,day day up!!!
  • 相关阅读:
    从电视剧《清平乐》聊聊宋仁宗和宋词
    也读《白鹿原》:望关中平原,窥民族秘史
    听说你在做数字化转型,了解中台一下不?
    刘润《商业洞察力30讲》学习总结
    《容器化.NET应用架构指南》脑图学习笔记(一)
    也聊春节:漫天红色与春晚变迁
    我的2019年终回顾:行道迟迟,载饥载渴,而立之年,持续刷新
    ASP.NET Core on K8S深入学习(11)K8S网络知多少
    ASP.NET Core on K8S深入学习(10)K8S包管理器Helm
    【译】gRPC vs HTTP APIs
  • 原文地址:https://www.cnblogs.com/canglongdao/p/12053653.html
Copyright © 2011-2022 走看看