zoukankan      html  css  js  c++  java
  • prometheus 笔记

    前言

      prometheus 是监控应用软件类似于nagios.

    安装

      1.官网下载prometheus-2.2.0.linux-amd64压缩包,解压,执行./prometheus即可。这里重要的是配置文件。

         a.如果要远程热加载配置文件,启动时加上--web.enable-lifecycle参数。 调用指令是curl -X POST http://localhost:9090/-/reload

         b.重要掌握 prometheus.yml 配置文件.prometheus启动时会加载它。

    [root@vm-local1 prometheus-2.2.0.linux-amd64]# cat prometheus.yml 
    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.评估间隔
      # scrape_timeout is set to the global default (10s). 默认抓取超时10秒
    
    # Alertmanager configuration #管理报警配置
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["localhost:9093"]  #管理报警包需要单独下载,默认启动端口是9093
          
        
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
      - rules/mengyuan.rules     #要发送报警,就得写规则,定义规则文件
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:    #抓取配置,就是你要抓取那些主机
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'  #任务名称
    
        # metrics_path defaults to '/metrics'  #默认抓取监控机的url后缀地址是/metrics
        # scheme defaults to 'http'.   #模式是http
    
        static_configs:
          - targets: ['localhost:9090','localhost:9100']
            labels:
              group: 'zus'    #targets就是要抓取的主机,对应的客户端,我这有两个,把它们俩规定为一个组,组名是zus
      - job_name: dj   #又建立个任务名称
        static_configs:
          - targets: ['localhost:8000']  #我用django自定义的客户端

     注意:

         localhost:9090,默认prometheus提供了数据抓取接口,9100端口是prometheus提供的一个监控客户端

    2.安装prometheus客户端

      官网下载node_exporter-0.16.0-rc.1.linux-amd64客户端,解压,执行./node_exporter 即可,默认是9100端口

    3.如何自定义一个客户端,其实很简单,只要返回的数据库类型是这样就可以.我这用的django..只要格式正确就可以

      

    def metrics(req):
        ss = "feiji 32" + "
    " + "caidian 31"
        return HttpResponse(ss)
    

    4.编写 rules/mengyuan.rules 规则,规则是发送报警的前提

    [root@vm-local1 rules]# cat mengyuan.rules 
    groups:
    - name: zus
      rules:
    
      # Alert for any instance that is unreachable for >5 minutes.
      - alert: InstanceDown   #报警名字随便写
        expr: up == 0   #这是一个表达式,如果主机up状态为0,表示关机了,条件为真就会触发报警 可以通过$value得到值
        for: 5s         #5s内,还是0,就发送报警信息,当然是发送给报警管理器
        labels:
          severity: page  #这个类型的报警定了个标签
        annotations:
          summary: "Instance {{ $labels.instance }} down dangqian  {{ $value }}"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
    

      

    5.现在安装报警管理器

      a.官网下载alertmanager-0.15.0-rc.1.linux-amd64  

        重要的还是配置文件,创建修改它

      

    [root@vm-local1 alertmanager-0.15.0-rc.1.linux-amd64]# cat alertmanager.yml 
    route:
      receiver: mengyuan2  #接收的名字,默认必须有一个,对应receivers的- name
      group_wait: 1s  #等待1s
      group_interval: 1s #发送间隔1s
      repeat_interval: 1m  #重复发送等待1m分钟再发
      group_by: ["zus"]   
      routes:      #路由了,匹配规则标签的severity:page 走 receiver: mengyuan , 如果routes不写,就会走默认的mengyuan2
      - receiver: mengyuan  
        match:
          severity: page
    
    receivers:
    - name: 'mengyuan'
      webhook_configs:  #这我用的webhook_configs 钩子方法,  默认会把规则的报警信息发送到127.0.0.1:8000
      - url: http://127.0.0.1:8000
        send_resolved: true
    - name: 'mengyuan2'
      webhook_configs:
      - url: http://127.0.0.1:8000/2
        send_resolved: true
    

    6.django接收报警发过来的消息

      用Django的  request.body会受到json格式的数据,大概像这样

      {"receiver":"mengyuan","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"InstanceDown","group":"zus","instance":"localhost:9100","job":"prometheus","severity":"page"},"annotations":{"description":"localhost:9100 of job prometheus has been down for more than 5 minutes.","summary":"Instance localhost:9100 down dangqian  0"},"startsAt":"2018-04-06T22:34:13.51281763+08:00","endsAt":"2018-04-06T23:07:43.514552824+08:00","generatorURL":"http://vm-local1:9090/graph?g0.expr=up+%3D%3D+0u0026g0.tab=1"}],"groupLabels":{},"commonLabels":{"alertname":"InstanceDown","group":"zus","instance":"localhost:9100","job":"prometheus","severity":"page"},"commonAnnotations":{"description":"localhost:9100 of job prometheus has been down for more than 5 minutes.","summary":"Instance localhost:9100 down dangqian  0"},"externalURL":"http://vm-local1:9093","version":"4","groupKey":"{}/{severity="page"}:{}"}

     到此,我就可以根据收到的数据,调用邮件接口,或其他第三方报警接口了。

    总结:

       本人也是刚入门。做的一个笔记。

  • 相关阅读:
    JDBC 查询的三大参数 setFetchSize prepareStatement(String sql, int resultSetType, int resultSetConcur)
    有空必看
    SpringMVC 利用AbstractRoutingDataSource实现动态数据源切换
    FusionCharts JavaScript API Column 3D Chart
    FusionCharts JavaScript API
    FusionCharts JavaScript API
    Extjs 继承Ext.Component自定义组件
    eclipse 彻底修改复制后的项目名称
    spring 转换器和格式化
    Eclipse快速生成一个JavaBean类的方法
  • 原文地址:https://www.cnblogs.com/whf191/p/8729460.html
Copyright © 2011-2022 走看看