zoukankan      html  css  js  c++  java
  • Promethues + Grafana + AlertManager使用总结

    Prometheus是一个开源监控报警系统和时序列数据库,通常会使用Grafana来美化数据展示。

    1. 监控系统基础架

    1.1核心组件

    • Prometheus Server, 主要用于抓取数据和存储时序数据,另外还提供查询和 Alert Rule 配置管理。
    • exporters ,数据采样器,例如采集机器数据的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
    • alertmanager ,用于告警通知管理。
    • Grafana ,监控数据图表化展示模块。

    2. 基础组件安装

    由于是学习研究使用,这里通过docker快速安装环境。

    2.1 安装Node Exporter

    • docker-compose-node-export.yml

      version: '3'
      services:
        node-exporter:
          image: prom/node-exporter
          container_name: node-exporter
          hostname: node-exporter
          restart: always
          ports:
            - "9100:9100"
      

    2.2 安装Alert Manager

    • docker-compose-alertmanager.yml

      version: '3'
      services:
        alertmanager:
          image: prom/alertmanager
          container_name: alertmanager
          hostname: alertmanager
          restart: always
          volumes:
            - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
          ports:
            - "9093:9093"
      
    • alertmanager.yml

      global:
        smtp_smarthost: 'smtp.qq.com:25'  		#QQ服务器
        smtp_from: '793272861@qq.com'        	#发邮件的邮箱
        smtp_auth_username: '793272861@qq.com'  	#发邮件的邮箱用户名,也就是你的邮箱
        smtp_auth_password: '****************'  	#发邮件的邮箱密码
        smtp_require_tls: false        		#不进行tls验证
       
      route:
        group_by: ['alertname']
        group_wait: 10s
        group_interval: 10s
        repeat_interval: 10m
        receiver: live-monitoring
      
      receivers:
      - name: 'live-monitoring'
        email_configs:
        - to: '793272861@qq.com'        		#收邮件的邮箱
      

    2.3 安装Prometheus

    • docker-compose-prometheus.yml

      version: '3'
      services:
        prometheus:
          image: prom/prometheus
          container_name: prometheus
          hostname: prometheus
          restart: always
          volumes:
            - /data/docker_file/prometheus/data:/prometheus
            - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
          ports:
            - "9090:9090"
      
    • prometheus.yml

      # my global config
      global:
        scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
        evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
        # scrape_timeout is set to the global default (10s).
       
      # Alertmanager configuration
      alerting:
        alertmanagers:
        - static_configs:
          - targets: ['alertmanager:9093']
       
      # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
      rule_files:
        # - "first_rules.yml"
        # - "second_rules.yml"
       
      # A scrape configuration containing exactly one endpoint to scrape:
      # Here it's Prometheus itself.
      # 配置定时任务,轮询拉取监控数据
      scrape_configs:
        - job_name: 'prometheus'
          static_configs:
            - targets: ['prometheus:9090']
        - job_name: 'node-exporter'
          scrape_interval: 5s
          static_configs:
            - targets: ['node-exporter:9100']
      
    • Prometheus服务发现机制

    • 访问:http://localhost:9090/

    2.4 安装Grafana

    • docker-compose-grafana.yml

      version: '3'
      services:
        grafana:
          image: grafana/grafana
          container_name: grafana
          hostname: grafana
          restart: always
          environment:
            - GF_SECURITY_ADMIN_PASSWORD=admin
          volumes:
            - /data/docker_file/grafana/data:/var/lib/grafana
            - /data/docker_file/grafana/log:/var/log/grafana
          ports:
            - "3000:3000"
      
    • 添加数据源(Prometheus)

    • 访问:http://localhost:30000/ , 默认用户名:admin,密码:admin

    2.5 Docker-Compose脚本

    version: '3'
    services:
      prometheus:
        image: prom/prometheus
        container_name: prometheus
        hostname: prometheus
        restart: always
        volumes:
          - /data/docker_file/prometheus/data:/prometheus
          - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
        ports:
          - "9090:9090"
        networks:
          - monitor
      alertmanager:
        image: prom/alertmanager
        container_name: alertmanager
        hostname: alertmanager
        restart: always
        volumes:
          - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
        ports:
          - "9093:9093"
        networks:
          - monitor
      grafana:
        image: grafana/grafana
        container_name: grafana
        hostname: grafana
        restart: always
        environment:
          - GF_SECURITY_ADMIN_PASSWORD=admin
        volumes:
          - /data/docker_file/grafana/data:/var/lib/grafana
          - /data/docker_file/grafana/log:/var/log/grafana
        ports:
          - "3000:3000"
        networks:
          - monitor
      node-exporter:
        image: prom/node-exporter
        container_name: node-exporter
        hostname: node-exporter
        restart: always
        ports:
          - "9100:9100"
        networks:
          - monitor
    networks:
      monitor:
        driver: bridge
     
    

    3. 配置Grafana DashBoard

    Grafana通过PromQL查询语句从Prometheus拉取数据,并有Pannel进行渲染,一个个Grafana Pannel 组成一个Grafana DashBoard。

    3.1下载Grafana DashBoard文件

    可以从官网下载已经写好的Grafana DashBoard文件,导入到我们Grafana系统就可以直接使用。

    推荐的Grafana DashBoard

    导入Grafana DashBoard

    3.2 添加修改Grafana Panel(扩展)

    官方自带的Spring Boot 2.1 Statistics Dashboard没有展示第三方请求的数据报表,我们以此为例,添加第三方请求的Client Request Count报表和Client Response Time报表。

    Client Request Count

    irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])

    注意:应用中的Meter的名称必须为http.client.requests

    Client Response Time

    irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])

    4. Spring Boot 集成Micrometer

    Metrics(译:指标,度量)

    Micrometer提供了与供应商无关的接口,包括 timers(计时器)gauges(量规)counters(计数器)distribution summaries(分布式摘要)long task timers(长任务定时器)。它具有维度数据模型,当与维度监视系统结合使用时,可以高效地访问特定的命名度量,并能够跨维度深入研究。

    4.1 引入依赖

    <dependency>
     	<groupId>io.micrometer</groupId>
       	<artifactId>micrometer-registry-prometheus</artifactId>
       	<version>${micrometer.version}</version>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    

    4.2 开启Prometheus功能

    spring:
      application:
        name: spring-boot-node
    
    management:
      metrics:
        # 1.添加全局的tags,后面可以作为变量搜索数据
        tags:
          application: ${spring.application.name}
      endpoints:
        web:
          exposure:
          	# 2.打开prometheus端点功能
            include: 'health,prometheus'
    

    4.3 实现第三方请求的监控

    基于OkHttpMetricsEventListener可以有好的对OkHttp Client的请求进行监控。

    配置OkHttp Client事件监听

    @Bean("okHttpClient")
    public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
        return new OkHttpClient().newBuilder().connectionPool(connectionPool)
                .connectTimeout(5, TimeUnit.SECONDS)
                .readTimeout(10, TimeUnit.SECONDS)
                .eventListener(eventListener())
                .build();
    }
    
    /**
    * 事件监听器 OkHttpMetricsEventListener
    * metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可称为度量。
    * @return
    */
    private EventListener eventListener(){
        return OkHttpMetricsEventListener.builder(
        meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
        .build();
    }
    

    原理:OkHttpMetricsEventListener.java

    public class OkHttpMetricsEventListener extends EventListener {
    
        /**
         * Header name for URI patterns which will be used for tag values.
         */
        public static final String URI_PATTERN = "URI_PATTERN";
    
        @Override
        public void callFailed(Call call, IOException e) {
            CallState state = callState.remove(call);
            if (state != null) {
                state.exception = e;
                // 请求完成时,注册监控数据
                time(state);
            }
        }
    
        @Override
        public void responseHeadersEnd(Call call, Response response) {
            CallState state = callState.remove(call);
            if (state != null) {
                state.response = response;
                // 请求完成时,注册监控数据
                time(state);
            }
        }
    
        private void time(CallState state) {
            String uri = state.response == null ? "UNKNOWN" :
                (state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));
    
            // 定义一些Tag或者是变量,在Prometheus和Grafana中可以使用
            Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
                "method", state.request != null ? state.request.method() : "UNKNOWN",
                "uri", uri,
                "status", getStatusMessage(state.response, state.exception),
                "host", state.request != null ? state.request.url().host() : "UNKNOWN"
            ));
    
            // 注册计时器监控数据,此时Prometheus可以通过Spring Boot Actuator提供的/actuator/promotheus断点来pull数据
            Timer.builder(this.requestsMetricName)
                .tags(tags)
                .description("Timer of OkHttp operation")
                .register(registry)
                .record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
        }
    
    }
    

    4.4 Spring Boot集成案例

    5. 参考文档

    【1】Grafana Dashboards

    【2】Centos7.X 搭建Prometheus+node-exporter+Grafana实时监控平台

    【3】Micrometer 快速入门

    【4】JVM应用度量框架Micrometer实战

    【5】SpringBoot+Prometheus:微服务开发中自定义业务监控指标的几点经验

  • 相关阅读:
    《Programming in Lua 3》读书笔记(十)
    《Programming in Lua 3》读书笔记(九)
    《Programming in Lua 3》读书笔记(八)
    [原]NYOJ-括号匹配-2(java)
    [原]NYOJ-字符串替换-113
    [原]NYOJ-小光棍数-458
    [原]NYOJ-公约数和公倍数 -40
    [原]NYOJ-开灯问题-77
    [原]NYOJ-数的位数-69
    [原]NYOJ-大数阶乘-28
  • 原文地址:https://www.cnblogs.com/kancy/p/12810117.html
Copyright © 2011-2022 走看看