zoukankan      html  css  js  c++  java
  • 监控prometheus

    一、prometheus-webhook-daingtalak

    github地址:[Releases · timonwong/prometheus-webhook-dingtalk · GitHub](https://github.com/timonwong/prometheus-webhook-dingtalk/releases)
    下载地址:[](https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz)

    自己去GitHub上下载需要的版本,然后解压:

    wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
    tar xf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz -C /data; cd /data
    mv prometheus-webhook-dingtalk-0.3.0.linux-amd64 prometheus-webhook-dingtalk

    修改配置文件:
    # cat default.tmpl

    {{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
    {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}
    
    {{ define "__text_alert_list" }}{{ range . }}
    **Labels**
    {{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}
    **Annotations**
    {{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
    {{ end }}
    **Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
    
    {{ end }}{{ end }}
    
    {{ define "ding.link.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "ding.link.content" }}#### [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
    {{ template "__text_alert_list" .Alerts.Firing }}
    {{ end }}

    启动服务:
    # cat prometheus-webhook-dingtalk.sh

    #!/bin/bash
    nohup prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="test=https://oapi.dingtalk.com/robot/send?access_token=89f3cedfb3c3cdb031bdf10f8fc52bf1add575e9b5fb6f462a8cca6859af4" >>/data/prometheus-webhook-daingtalak/nohub.out 2>&1 &

    --ding.profile是钉钉机器人生成的,自己创建个钉钉机器人。

    二、Alertmanager
    github地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)

    下载地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)

    自己去GitHub上下载需要的版本,然后解压:

    wget https://github.com/prometheus/alertmanager/releases/download/v0.15.1/alertmanager-0.15.1.linux-amd64.tar.gz
    tar xf alertmanager-0.15.1.linux-amd64.tar.gz -C /data ;cd /data
    mv alertmanager-0.15.1.linux-amd64 alertmanager

    修改配置文件,由于我自己使用的是钉钉告警,所以本文使用的钉钉:
    # cat alertmanager.yml

    global:
      resolve_timeout: 5m
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'test'
    receivers:
    - name: 'test'
      webhook_configs:
       - url: "http://127.0.0.1:8060/dingtalk/test/send"
         send_resolved: true

    此处的url是prometheus-webhook-daingtalak的地址,用于将告警信息转换成钉钉可以接受的消息格式。

    启动alertmanager:
    # cat alertmanager.sh

    #!/bin/bash
    nohup alertmanager --config.file="/data/alertmanager/alertmanager.yml" --storage.path="/data/alertmanager/data" --web.listen-address="0.0.0.0:9093" >>/data/alertmanager/nohub.out 2>&1 &

    alertmanager访问地址:
    http://ip:9093

    三、Prometheus

    github地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)

    1、prometheus组成
    1)prometheus:主程序,主要负责采集数据以及数据存储,并且对外提供PromQL实现监控数据的查询以及聚合分析;
    2)*_exporter:于向Prometheus Server暴露数据采集的endpoint,Prometheus轮训这些Exporter采集并且保存数据;
    3)alertManager: 负责实现告警,结合邮件或钉钉
    4)pushgateway: Prometheus为一些临时存在的进程,如批处理任务,提供了Push Gateway,这些客户端可以将数据push到Push Gateway中,然后由Push Gateway提供pull接口将数据暴露给PrometheusServer。

    5)prometheus主要通过pull的方式获取数据,这样就大大减少了被监控端的压力和系统资源的占用。

    2、安装
    下载地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)
    自己去GitHub上下载需要的版本,然后解压:

    wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz
    tar xf prometheus-2.3.2.linux-amd64.tar.gz -C /data ;cd /data
    mv prometheus-2.3.2.linux-amd64 prometheus

    然后修改配置文件,定义相应的监控项job:
    # cat prometheus.yml

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    #remote_write:
    #  - url: "http://10.2.79.208:9201/write"
    #remote_read:
    #  - url: "http://10.2.79.208:9201/read"
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - 127.0.0.1:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
      - "/data/prometheus/mongodb-rules.yml"
      - "/data/prometheus/consul-rules.yml"
      - "/data/prometheus/redis-rules.yml"
      - "/data/prometheus/nginx-rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
        - targets: ['localhost:9090']
      - job_name: 'mongodb1'
        static_configs:
        - targets: ['10.10.8.70:9218']
      - job_name: 'mongodb1-system'
        static_configs:
        - targets: ['10.10.8.70:9100']
    
      - job_name: 'mongodb2'
        static_configs:
        - targets: ['10.10.5.108:9218']


    rule_files:指定告警规则文件的路径,可以定义自己的告警规则

    # cat consul-rules.yml

    ---
    groups:
    - name: consul
      rules:
      - alert: consul_catalog_service_node_healthy
        expr: consul_catalog_service_node_healthy < 1
        for: 60s
        labels:
          serverity: critical
        annotations:
          descrition: '{{ $labels.node }}  {{ $labels.service_id }} is Unhealth'
          summary: 'some service is unhealth,you must chek it out by consul'
    
      - alert: consul_node_health
        expr: consul_exporter_build_info < 1
        for: 60s
        labels:
           serverity: critical
        annotations:
           descrition: '{{ $labels.instance }} consul server is down '
           summary: 'consul server is down'
    
      - alert: consul_health_service_status
        expr: consul_health_service_status < 1
        for: 60s
        labels:
          serverity: critical
        annotations:
          descrition: '{{ $labels.node }}  {{ $labels.service_id }} is Unhealth'
          summary: 'some service is unhealth,you must chek it out by consul'

    # cat mongodb-rules.yml

    ---
    groups:
    - name: mongodb
      rules:
      - alert: mongodb_mongod_connections
        expr: mongodb_mongod_connections{state='current'} and  mongodb_mongod_connections < 0
        for: 10s
        labels:
          serverity: critical
        annotations:
          description: '{{ $labels.instance }}   of      {{ $labels.job }}   connections is low  11'
          summary: 'connections is too Low,Mongodb mybe is Down!'
    
      - alert: mongodb_mongod_connections
        expr: mongodb_mongod_connections{state='current'} and  mongodb_mongod_connections > 600
        for: 10s
        labels:
          serverity: warning
        annotations:
          description: '{{ $labels.instance }}   of      {{ $labels.job }}   connections is high  570'
          summary: 'connections is too much'
    
      - alert: mongodb_mongod_memory
        expr:  mongodb_mongod_memory{type='virtual'} and mongodb_mongod_memory < 5000
        for: 5s
        labels:
          serverity: critical
        annotations:
          description: '{{ $labels.instance }} of  {{ $labels.job }} {{ $labels.type }}   is too low'
          summary: 'mongodb mybe is down'
    
      - alert: mongodb_mongod_replset_member_health
        expr: mongodb_mongod_replset_member_health != 1
        for: 5s
        labels:
          serverity: critical
        annotations:
          description: ' {{ $labels.name }}  {{ $labels.state}} is down'
          summary: 'one of replsets node is down'
    
      - alert: mongodb_mongod_replset_my_state
        expr: mongodb_mongod_replset_my_state{job='mongodb3'} and mongodb_mongod_replset_my_state != 1
        for: 5s
        labels:
          serverity: critical
        annotations:
          description: ' replsets master have been  changed, {{ $labels.job }}  is not master'
          summary: 'mongodb3 master is down,chek the status'

    #cat redis-rules.yml

    ---
    groups:
    - name: redis
      rules:
      - alert: redis_instantaneous_ops_per_sec
        expr: redis_instantaneous_ops_per_sec < 50
        for: 120s
        labels:
          serverity: critical
        annotations:
          descrition: '{{ $labels.job }}   is Unhealth'
          summary: 'redis-prod options/sec is too low,redis maybe traffic jam ,you must check it out by "redis-cli slowlog get"'

    #cat nginx-rules.yml

    ---
    groups:
    - name: nginx-exporter
      rules:
      - alert: status_code_499
        expr: status_code_499 > 300
        for: 60s
        labels:
          serverity: critical
        annotations:
          descrition: ' status_code_499:{{ status_code_499 }}'
          summary: 'nginx status code 499 is too much,check loadbalance /var/log/nginx/share.log'
    
    
      - alert: status_code_400
        expr: status_code_400 > 50
        for: 60s
        labels:
          serverity: critical
        annotations:
          descrition: 'status_code_400: {{ status_code_400 }}'
          summary: 'nginx status code 400 is too much,check loadbalance /var/log/nginx/share.log'

    nginx是我自己写的一个exportor,地址:https://github.com/cuishuaigit/nginx_exporter

    启动:
    # cat prometheus.sh

    #!/bin/bash
    nohup prometheus --config.file="/data/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090"  --storage.tsdb.path="/data/prometheus/data"  --web.console.libraries="/data/prometheus/console_libraries"  --web.console.templates="/data/prometheus/consoles"  --web.enable-admin-api --log.level=info >>/data/prometheus/nohub.out 2>&1 &

    prometheus_ui访问:
    http://ip:9090

    四、exporter

    1、https://github.com/prometheus/node_exporter

    2、https://github.com/prometheus/influxdb_exporter

    3、https://github.com/prometheus/mysqld_exporter

    4、https://github.com/prometheus/jmx_exporter

    5、https://github.com/prometheus/consul_exporter

    6、https://github.com/prometheus/haproxy_exporter

  • 相关阅读:
    安卓组件service
    安卓组件-BroadcastReceiver
    【bug】java.lang.NoSuchMethodError: android.widget.TextView.setBackground
    【转】安卓毛玻璃效果
    INSTALL_FAILED_UPDATE_INCOMPATIBLE
    安卓 异步线程更新Ui
    OC语言-03-OC语言-三大特性
    OC语言-02-OC语言-基础知识
    OC语言-01-面向过程与面向对象思想
    C语言-07-预处理、typedef、static和extern
  • 原文地址:https://www.cnblogs.com/cuishuai/p/9378623.html
Copyright © 2011-2022 走看看