zoukankan      html  css  js  c++  java
  • Prometheus 报警规则配置

    prometheus监控系统的的报警规则是在prometheus这个组件完成配置的。 prometheus支持2种类型的规则,记录规则和报警规则, 记录规则主要是为了简写报警规则和提高规则复用的, 报警规则才是真正去判定是否需要报警的规则。 报警规则中是可以使用记录规则的。

    提供下我整理的node-exporter的记录规则和报警规则。

    node-exporter-record-rules.yml

    复制代码
    groups:
      - name: node-exporter-record
        rules:
        - expr: up{job=~"node-exporter"}
          record: node_exporter:up 
          labels: 
            desc: "节点是否在线, 在线1,不在线0"
            unit: " "
            job: "node-exporter"
        - expr: time() - node_boot_time_seconds{}
          record: node_exporter:node_uptime
          labels: 
            desc: "节点的运行时间"
            unit: "s"
            job: "node-exporter"
    ##############################################################################################
    #                              cpu                                                           #
        - expr: (1 - avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m])))  * 100 
          record: node_exporter:cpu:total:percent
          labels: 
            desc: "节点的cpu总消耗百分比"
            unit: "%"
            job: "node-exporter"
    
        - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m])))  * 100 
          record: node_exporter:cpu:idle:percent
          labels: 
            desc: "节点的cpu idle百分比"
            unit: "%"
            job: "node-exporter"
    
        - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="iowait"}[5m])))  * 100 
          record: node_exporter:cpu:iowait:percent
          labels: 
            desc: "节点的cpu iowait百分比"
            unit: "%"
            job: "node-exporter"
    
    
        - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="system"}[5m])))  * 100 
          record: node_exporter:cpu:system:percent
          labels: 
            desc: "节点的cpu system百分比"
            unit: "%"
            job: "node-exporter"
    
        - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="user"}[5m])))  * 100 
          record: node_exporter:cpu:user:percent
          labels: 
            desc: "节点的cpu user百分比"
            unit: "%"
            job: "node-exporter"
    
        - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode=~"softirq|nice|irq|steal"}[5m])))  * 100 
          record: node_exporter:cpu:other:percent
          labels: 
            desc: "节点的cpu 其他的百分比"
            unit: "%"
            job: "node-exporter"
    ##############################################################################################
    
    
    ##############################################################################################
    #                                    memory                                                  #
        - expr: node_memory_MemTotal_bytes{job="node-exporter"}
          record: node_exporter:memory:total
          labels: 
            desc: "节点的内存总量"
            unit: byte
            job: "node-exporter"
    
        - expr: node_memory_MemFree_bytes{job="node-exporter"}
          record: node_exporter:memory:free
          labels: 
            desc: "节点的剩余内存量"
            unit: byte
            job: "node-exporter"
    
        - expr: node_memory_MemTotal_bytes{job="node-exporter"} - node_memory_MemFree_bytes{job="node-exporter"}
          record: node_exporter:memory:used
          labels: 
            desc: "节点的已使用内存量"
            unit: byte
            job: "node-exporter"
    
        - expr: node_memory_MemTotal_bytes{job="node-exporter"} - node_memory_MemAvailable_bytes{job="node-exporter"}
          record: node_exporter:memory:actualused
          labels: 
            desc: "节点用户实际使用的内存量"
            unit: byte
            job: "node-exporter"
    
        - expr: (1-(node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"})))* 100
          record: node_exporter:memory:used:percent
          labels: 
            desc: "节点的内存使用百分比"
            unit: "%"
            job: "node-exporter"
    
        - expr: ((node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"})))* 100
          record: node_exporter:memory:free:percent
          labels: 
            desc: "节点的内存剩余百分比"
            unit: "%"
            job: "node-exporter"
    ##############################################################################################
    #                                   load                                                     #
        - expr: sum by (instance) (node_load1{job="node-exporter"})
          record: node_exporter:load:load1
          labels: 
            desc: "系统1分钟负载"
            unit: " "
            job: "node-exporter"
    
        - expr: sum by (instance) (node_load5{job="node-exporter"})
          record: node_exporter:load:load5
          labels: 
            desc: "系统5分钟负载"
            unit: " "
            job: "node-exporter"
    
        - expr: sum by (instance) (node_load15{job="node-exporter"})
          record: node_exporter:load:load15
          labels: 
            desc: "系统15分钟负载"
            unit: " "
            job: "node-exporter"
       
    ##############################################################################################
    #                                 disk                                                       #
        - expr: node_filesystem_size_bytes{job="node-exporter" ,fstype=~"ext4|xfs"}
          record: node_exporter:disk:usage:total
          labels: 
            desc: "节点的磁盘总量"
            unit: byte
            job: "node-exporter"
    
        - expr: node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"}
          record: node_exporter:disk:usage:free
          labels: 
            desc: "节点的磁盘剩余空间"
            unit: byte
            job: "node-exporter"
    
        - expr: node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"} - node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"}
          record: node_exporter:disk:usage:used
          labels: 
            desc: "节点的磁盘使用的空间"
            unit: byte
            job: "node-exporter"
    
        - expr:  (1 - node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"}) * 100 
          record: node_exporter:disk:used:percent    
          labels: 
            desc: "节点的磁盘的使用百分比"
            unit: "%"
            job: "node-exporter"
    
        - expr: irate(node_disk_reads_completed_total{job="node-exporter"}[1m])
          record: node_exporter:disk:read:count:rate
          labels: 
            desc: "节点的磁盘读取速率"
            unit: "次/秒"
            job: "node-exporter"
    
        - expr: irate(node_disk_writes_completed_total{job="node-exporter"}[1m])
          record: node_exporter:disk:write:count:rate
          labels: 
            desc: "节点的磁盘写入速率"
            unit: "次/秒"
            job: "node-exporter"
    
        - expr: (irate(node_disk_written_bytes_total{job="node-exporter"}[1m]))/1024/1024
          record: node_exporter:disk:read:mb:rate
          labels: 
            desc: "节点的设备读取MB速率"
            unit: "MB/s"
            job: "node-exporter"
    
        - expr: (irate(node_disk_read_bytes_total{job="node-exporter"}[1m]))/1024/1024
          record: node_exporter:disk:write:mb:rate
          labels: 
            desc: "节点的设备写入MB速率"
            unit: "MB/s"
            job: "node-exporter"
    
    ##############################################################################################
    #                                filesystem                                                  #
        - expr:   (1 -node_filesystem_files_free{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_files{job="node-exporter",fstype=~"ext4|xfs"}) * 100 
          record: node_exporter:filesystem:used:percent    
          labels: 
            desc: "节点的inode的剩余可用的百分比"
            unit: "%"
            job: "node-exporter"
    #############################################################################################
    #                                filefd                                                     #
        - expr: node_filefd_allocated{job="node-exporter"}
          record: node_exporter:filefd_allocated:count
          labels: 
            desc: "节点的文件描述符打开个数"
            unit: "%"
            job: "node-exporter"
     
        - expr: node_filefd_allocated{job="node-exporter"}/node_filefd_maximum{job="node-exporter"} * 100 
          record: node_exporter:filefd_allocated:percent
          labels: 
            desc: "节点的文件描述符打开百分比"
            unit: "%"
            job: "node-exporter"
    
    #############################################################################################
    #                                network                                                    #
        - expr: avg by (environment,instance,device) (irate(node_network_receive_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
          record: node_exporter:network:netin:bit:rate
          labels: 
            desc: "节点网卡eth0每秒接收的比特数"
            unit: "bit/s"
            job: "node-exporter"
    
        - expr: avg by (environment,instance,device) (irate(node_network_transmit_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
          record: node_exporter:network:netout:bit:rate
          labels: 
            desc: "节点网卡eth0每秒发送的比特数"
            unit: "bit/s"
            job: "node-exporter"
    
        - expr: avg by (environment,instance,device) (irate(node_network_receive_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
          record: node_exporter:network:netin:packet:rate
          labels: 
            desc: "节点网卡每秒接收的数据包个数"
            unit: "个/秒"
            job: "node-exporter"
    
        - expr: avg by (environment,instance,device) (irate(node_network_transmit_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
          record: node_exporter:network:netout:packet:rate
          labels: 
            desc: "节点网卡发送的数据包个数"
            unit: "个/秒"
            job: "node-exporter"
    
        - expr: avg by (environment,instance,device) (irate(node_network_receive_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
          record: node_exporter:network:netin:error:rate
          labels: 
            desc: "节点设备驱动器检测到的接收错误包的数量"
            unit: "个/秒"
            job: "node-exporter"
    
        - expr: avg by (environment,instance,device) (irate(node_network_transmit_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
          record: node_exporter:network:netout:error:rate
          labels: 
            desc: "节点设备驱动器检测到的发送错误包的数量"
            unit: "个/秒"
            job: "node-exporter"
          
        - expr: node_tcp_connection_states{job="node-exporter", state="established"}
          record: node_exporter:network:tcp:established:count
          labels: 
            desc: "节点当前established的个数"
            unit: "个"
            job: "node-exporter"
    
        - expr: node_tcp_connection_states{job="node-exporter", state="time_wait"}
          record: node_exporter:network:tcp:timewait:count
          labels: 
            desc: "节点timewait的连接数"
            unit: "个"
            job: "node-exporter"
    
        - expr: sum by (environment,instance) (node_tcp_connection_states{job="node-exporter"})
          record: node_exporter:network:tcp:total:count
          labels: 
            desc: "节点tcp连接总数"
            unit: "个"
            job: "node-exporter"
       
    #############################################################################################
    #                                process                                                    #
        - expr: node_processes_state{state="Z"}
          record: node_exporter:process:zoom:total:count
          labels: 
            desc: "节点当前状态为zoom的个数"
            unit: "个"
            job: "node-exporter"
    #############################################################################################
    #                                other                                                    #
        - expr: abs(node_timex_offset_seconds{job="node-exporter"})
          record: node_exporter:time:offset
          labels: 
            desc: "节点的时间偏差"
            unit: "s"
            job: "node-exporter"
    
    #############################################################################################
       
        - expr: count by (instance) ( count by (instance,cpu) (node_cpu_seconds_total{ mode='system'}) ) 
          record: node_exporter:cpu:count
    #
    复制代码

    node-exporter-alert-rules.yml

    复制代码
    groups:
      - name: node-exporter-alert
        rules:
        - alert: node-exporter-down
          expr: node_exporter:up == 0 
          for: 1m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 宕机了"  
            description: "instance: {{ $labels.instance }} 
    - job: {{ $labels.job }} 关机了, 时间已经1分钟了。" 
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
    
    
        - alert: node-exporter-cpu-high 
          expr:  node_exporter:cpu:total:percent > 80
          for: 3m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} cpu 使用率高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-cpu-iowait-high 
          expr:  node_exporter:cpu:iowait:percent >= 12
          for: 3m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} cpu iowait 使用率高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-load-load1-high 
          expr:  (node_exporter:load:load1) > (node_exporter:cpu:count) * 1.2
          for: 3m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} load1 使用率高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-memory-high
          expr:  node_exporter:memory:used:percent > 85
          for: 3m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} memory 使用率高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-disk-high
          expr:  node_exporter:disk:used:percent > 88
          for: 10m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} disk 使用率高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-disk-read:count-high
          expr:  node_exporter:disk:read:count:rate > 3000
          for: 2m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} iops read 使用率高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-disk-write-count-high
          expr:  node_exporter:disk:write:count:rate > 3000
          for: 2m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} iops write 使用率高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
    
    
    
        - alert: node-exporter-disk-read-mb-high
          expr:  node_exporter:disk:read:mb:rate > 60 
          for: 2m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 读取字节数 高于 {{ $value }}"  
            description: ""    
            instance: "{{ $labels.instance }}"
            value: "{{ $value }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-disk-write-mb-high
          expr:  node_exporter:disk:write:mb:rate > 60
          for: 2m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 写入字节数 高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-filefd-allocated-percent-high 
          expr:  node_exporter:filefd_allocated:percent > 80
          for: 10m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 打开文件描述符 高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-network-netin-error-rate-high
          expr:  node_exporter:network:netin:error:rate > 4
          for: 1m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 包进入的错误速率 高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
        - alert: node-exporter-network-netin-packet-rate-high
          expr:  node_exporter:network:netin:packet:rate > 35000
          for: 1m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 包进入速率 高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-network-netout-packet-rate-high
          expr:  node_exporter:network:netout:packet:rate > 35000
          for: 1m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 包流出速率 高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-network-tcp-total-count-high
          expr:  node_exporter:network:tcp:total:count > 40000
          for: 1m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} tcp连接数量 高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-process-zoom-total-count-high 
          expr:  node_exporter:process:zoom:total:count > 10
          for: 10m
          labels: 
            severity: info
          annotations: 
            summary: "instance: {{ $labels.instance }} 僵死进程数量 高于 {{ $value }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    
        - alert: node-exporter-time-offset-high
          expr:  node_exporter:time:offset > 0.03
          for: 2m
          labels: 
            severity: info
          annotations:
            summary: "instance: {{ $labels.instance }} {{ $labels.desc }}  {{ $value }} {{ $labels.unit }}"  
            description: ""    
            value: "{{ $value }}"
            instance: "{{ $labels.instance }}"
            grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
            console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
            cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
            id: "{{ $labels.instanceid }}"
            type: "aliyun_meta_ecs_info"
    复制代码

    准备这2个文件放置到/usr/local/prometheus/prometheus/rules文件夹里面,确保prometheus的主配置文件有如下部分: 

    rule_files:
      - "rules/*rules.yml"
      # - "second_rules.yml"

    重启prometheus服务, 可以在web界面看到对应的规则。

    可以直接在表达式浏览器中输入我们定义好的记录规则表达式了,如下。 

    其他

    网上对prometheus的规则相对较少, 这里提供一个地址,可以参考参考: https://awesome-prometheus-alerts.grep.to/rules

  • 相关阅读:
    mui h5 动态实现数据的移除和数据操作完后的重新获取
    mui H5 js动态添加不同类型的数据
    hbuider 中点击就显示出一个单选的列表 ,然后后台跨域向里面动态添加数据,注意里面的格式是json object
    H-UI的前端处理验证,判断是否已经存在,比较健全的模板,可以自己添加一些校验
    spring mvc 和mybatis整合 的异常处理
    列表显示数据 但是数据的字体颜色要js添加
    hadoop环境都配置好后,当运行sbin下的start-hdfs.sh时报WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...错误
    VisualVM初次使用BTrace功能方法步骤
    二叉树--递归实现
    二叉树--非递归实现
  • 原文地址:https://www.cnblogs.com/cheyunhua/p/13808889.html
Copyright © 2011-2022 走看看