zoukankan      html  css  js  c++  java
  • prometheus集群方案-thanos

     场景:

    随着监控数据的增长,单个prometheus采集数据性能无法满足,即使100G+内存,也会出现OOM现象。

    解决思路:

    1.减少prometheus驻留内存的数据量,将数据持久化到tsdb或对象存储;

    2.根据业务切割成多个prometheus,分模块存储数据。若需要进行多个promenade之间的汇聚,利用thanos的query实现。

    搭建thanos前提假设:

    1.已经安装docker和docker compose(本例子通过docker-compose进行安装部署)

    2.通过2个prometheus验证thanos的可用性

    安装步骤:

    1.

    #定义2个prometheus的存储路径

    mkdir -p /home/dockerdata/prometheus   
    
    mkdir -p /home/dockerdata/prometheus2

    #定义minio(用于对象存储)和docker-compose的路径

    mkdir -p /home/dockerfile/thanos
    mkdir -p /home/minio/data

    2.minio配置文件(位于/home/dockerfile/thanos/bucket_config.yaml)

    type: S3
    config:
      bucket: "thanos"
      endpoint: 'minio:9000'
      access_key: "danny"
      insecure: true  #是否使用安全协议http或https
      signature_version2: false
      encrypt_sse: false
      secret_key: "xxxxxxxx" #设置s3密码,保证8位以上的长度
      put_user_metadata: {}
      http_config:
        idle_conn_timeout: 90s
        response_header_timeout: 2m
        insecure_skip_verify: false
      trace:
        enable: false
      part_size: 134217728

    3.prometheus配置文件(位于/home/dockerfile/thanos/prometheus.yml和/home/dockerfile/thanos/prometheus2.yml)。有2个prometheus配置文件主要是区分端口和extends label。必须定义extends label,用于区分相同metrics的数据源

    prometheus:

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
      external_labels:
        monitor: 'danny-ecs'
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
        #- targets: ['localhost:9090']
        - targets: ['node-exporter:9100']

    prometheus2.yml

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
      external_labels:
        monitor: 'prometheus2'
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
        #- targets: ['localhost:9091']
        - targets: ['node-exporter:9100']

    4.配置Dockerfile(docker-compose用到的,不确定是否必须。位于:/home/dockerfile/thanos)

    FROM quay.io/prometheus/busybox:latest
    LABEL maintainer="danny"
    
    COPY /thanos_tmp_for_docker /bin/thanos
    
    ENTRYPOINT [ "/bin/thanos" ]

    5.docker-compose配置文件定义(基本所有内容都在这里了)

    version: '2'
    services:
      prometheus1:
        container_name: prometheus1
        image: prom/prometheus
        ports: 
          - 9090:9090
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus:/prometheus
        - /home/dockerfile/thanos/prometheus.yml:/etc/prometheus/prometheus.yml  
        command:
          - --web.enable-lifecycle
          - --web.enable-admin-api
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --web.console.libraries=/usr/share/prometheus/console_libraries
          - --web.console.templates=/usr/share/prometheus/consoles
          - --storage.tsdb.min-block-duration=30m # small just to not wait hours to test :)
          - --storage.tsdb.max-block-duration=30m # small just to not wait hours to test :)
        depends_on:
        - minio
    
      sidecar1:
        container_name: sidecar1
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus:/var/prometheus
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - sidecar
        - --tsdb.path=/var/prometheus
        - --prometheus.url=http://prometheus1:9090
        - --objstore.config-file=/bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --grpc-address=0.0.0.0:19090
        depends_on:
        - minio
        - prometheus1
    
      prometheus2:
        container_name: prometheus2
        image: prom/prometheus
        ports:
          - 9091:9090
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus2:/prometheus
        - /home/dockerfile/thanos/prometheus2.yml:/etc/prometheus/prometheus.yml
        command:
          - --web.enable-lifecycle
          - --web.enable-admin-api
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --web.console.libraries=/usr/share/prometheus/console_libraries
          - --web.console.templates=/usr/share/prometheus/consoles
          - --storage.tsdb.min-block-duration=30m 
          - --storage.tsdb.max-block-duration=30m 
        depends_on:
        - minio
    
      sidecar2:
        container_name: sidecar2
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
        volumes:
        - /home/dockerdata/prometheus2:/var/prometheus
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - sidecar
        - --tsdb.path=/var/prometheus
        - --prometheus.url=http://prometheus2:9090
        - --objstore.config-file=/bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --grpc-address=0.0.0.0:19090
        depends_on:
        - minio
        - prometheus2
    
      grafana:
        container_name: grafana
        image: grafana/grafana
        ports: 
        - "3000:3000"
    
      # to search on old metrics
      storer:
        container_name: storer
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        volumes:
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - store
        - --data-dir=/var/thanos/store
        - --objstore.config-file=bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --grpc-address=0.0.0.0:19090
        depends_on:
        - minio  
    
      # downsample metrics on the bucket
      compactor:
        container_name: compactor
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        volumes:
        - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
        command:
        - compact
        - --data-dir=/var/thanos/compact
        - --objstore.config-file=bucket_config.yaml
        - --http-address=0.0.0.0:19191
        - --wait
        depends_on:
        - minio 
    
      # querier component which can be scaled
      querier:
        container_name: querier
        image: quay.io/thanos/thanos:v0.13.0-rc.2
        labels:
        - "traefik.enable=true"
        - "traefik.port=19192"
        - "traefik.frontend.rule=PathPrefix:/"
        ports: 
        - "19192:19192"
        command:
        - query
        - --http-address=0.0.0.0:19192
        - --store=sidecar1:19090
        - --store=sidecar2:19090
        - --store=storer:19090
        - --query.replica-label=replica
    
      minio:
        image: minio/minio:latest
        container_name: minio
        ports:
          - 9000:9000
        volumes: 
          - "/home/minio/data:/data"
        environment:
          MINIO_ACCESS_KEY: "danny"
          MINIO_SECRET_KEY: "xxxxxxxx" #输入8位以上的密码
        command: server /data
        restart: always
        logging:
          driver: "json-file"
          options:
            max-size: "5m"
            max-file: "3"
    
      node-exporter:
        image: prom/node-exporter:latest
        container_name: node-exporter
        ports:
          - '9100:9100'

    6.启动。第一次up的时候,storer会启动失败,需要单独重启一次storer

    cd /home/dockerfile/thanos
    docker-compose up -d
    docker-compose up -d storer

    7.验证

    docker ps -a

    证明所有容器已经启动成功

    8.promenade 的block数据成功上传到minio验证

    访问地址:http://ip:9000/minio/login

    账号密码在bucket_config.yaml中定义的

    创建bucket:thanos

    如果正常运行,block数据会成功上传到thanos这个bucket(即sidecar组建安装成功):

    9.验证query组建和store组建是否安装成功。

    访问query页面(跟promenade的页面基本一致)

    http://ip:19192

    输入metrices:node_uname_info

    安装部署参考链接:https://www.cnblogs.com/rongfengliang/p/11319933.html

    对thanos架构理解参考链接:http://www.dockone.io/article/10035

  • 相关阅读:
    (转)【web前端培训之前后端的配合(中)】继续昨日的故事
    ural(Timus) 1136. Parliament
    scau Josephus Problem
    ACMICPC Live Archive 6204 Poker End Games
    uva 10391 Compound Words
    ACMICPC Live Archive 3222 Joke with Turtles
    uva 10132 File Fragmentation
    uva 270 Lining Up
    【转】各种字符串哈希函数比较
    uva 10905 Children's Game
  • 原文地址:https://www.cnblogs.com/danny-djy/p/13230529.html
Copyright © 2011-2022 走看看