zoukankan      html  css  js  c++  java
  • prometheus学习系列七: Prometheus promQL查询语言

     Prometheus promQL查询语言

    Prometheus提供了一种名为PromQL (Prometheus查询语言)的函数式查询语言,允许用户实时选择和聚合时间序列数据。表达式的结果既可以显示为图形,也可以在Prometheus的表达式浏览器中作为表格数据查看,或者通过HTTP API由外部系统使用。

    准备工作

    在进行查询,这里提供下我的配置文件如下

    [root@node00 prometheus]# cat prometheus.yml
    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
        - targets: ['localhost:9090']
      - job_name: "node"
        file_sd_configs:
        - refresh_interval: 1m
          files: 
          - "/usr/local/prometheus/prometheus/conf/node*.yml"
    remote_write:
      - url: "http://localhost:8086/api/v1/prom/write?db=prometheus"
    
    remote_read:
      - url: "http://localhost:8086/api/v1/prom/read?db=prometheus"
    
    
    [root@node00 prometheus]# cat conf/node-dis.yml 
    - targets: 
      - "192.168.100.10:20001"
      labels: 
        __datacenter__: dc0
        __hostname__: node00
        __businees_line__: "line_a"
        __region_id__: "cn-beijing"
        __availability_zone__: "a"
    - targets: 
      - "192.168.100.11:20001"
      labels: 
        __datacenter__: dc1
        __hostname__: node01
        __businees_line__: "line_a"
        __region_id__: "cn-beijing"
        __availability_zone__: "a"
    - targets: 
      - "192.168.100.12:20001"
      labels: 
        __datacenter__: dc0
        __hostname__: node02
        __businees_line__: "line_c"
        __region_id__: "cn-beijing"
        __availability_zone__: "b"

    简单时序查询

    直接查询特定metric_name

    # 节点的forks的总次数
    node_forks_total
    #结果如下
    
    
    ElementValue
    node_forks_total{instance="192.168.100.10:20001",job="node"} 201518
    node_forks_total{instance="192.168.100.11:20001",job="node"} 23951
    node_forks_total{instance="192.168.100.12:20001",job="node"} 24127
     

    带标签的查询

    node_forks_total{instance="192.168.100.10:20001"}
    # 结果如下
    ElementValue
    node_forks_total{instance="192.168.100.10:20001",job="node"} 201816

    多标签查询

    node_forks_total{instance="192.168.100.10:20001",job="node"}

    # 结果如下
    ElementValue
    node_forks_total{instance="192.168.100.10:20001",job="node"} 201932

    查询2分钟的时序数值

    node_forks_total{instance="192.168.100.10:20001",job="node"}[2m]

    ElementValue
    node_forks_total{instance="192.168.100.10:20001",job="node"} 201932 @1569492864.036
    201932 @1569492879.036
    201932 @1569492894.035
    201932 @1569492909.036
    201985 @1569492924.036
    201989 @1569492939.036
    201993 @1569492954.036

     正则匹配

    node_forks_total{instance=~"192.168.*:20001",job="node"}
    ElementValue
    node_forks_total{instance="192.168.100.10:20001",job="node"} 202107
    node_forks_total{instance="192.168.100.11:20001",job="node"} 24014
    node_forks_total{instance="192.168.100.12:20001",job="node"} 24186

    常用函数查询

    官方提供的函数比较多, 具体可以参考地址如下: https://prometheus.io/docs/prometheus/latest/querying/functions/

    这里主要就常用函数进行演示。

    irate

    irate用于计算速率。

    # 通过标签查询,特定实例特定job,特定cpu 在idle状态下的cpu次数速率
    irate(node_cpu_seconds_total{cpu="0",instance="192.168.100.10:20001",job="node",mode="idle"}[1m])

    ElementValue
    {cpu="0",instance="192.168.100.10:20001",job="node",mode="idle"} 0.9833988932595507

    count_over_time

    计算特定的时序数据中的个数。

    # 这个数值个数和采集频率有关, 我们的采集间隔是15s,在一分钟会有4个点位数据。
    count_over_time(node_boot_time_seconds[1m])

    ElementValue
    {instance="192.168.100.10:20001",job="node"} 4
    {instance="192.168.100.11:20001",job="node"} 4
    {instance="192.168.100.12:20001",job="node"} 4

    子查询

    # 过去的10分钟内, 每分钟计算下过去5分钟的一个速率值。 一个采集10m/1m一共10个值。
    rate(node_cpu_seconds_total{cpu="0",instance="192.168.100.10:20001",job="node",mode="idle"}[5m])[10m:1m]
    ElementValue
    {cpu="0",instance="192.168.100.10:20001",job="node",mode="idle"} 0.9865228543057867 @1569494040
    0.9862807017543735 @1569494100
    0.9861087231885309 @1569494160
    0.9864946894550303 @1569494220
    0.9863192502430038 @1569494280
    0.9859649122807017 @1569494340
    0.9859298245613708 @1569494400
    0.9869122807017177 @1569494460
    0.9867368421052672 @1569494520
    0.987438596491273 @1569494580

    复杂查询

    计算内存使用百分比

    node_memory_MemFree_bytes / node_memory_MemTotal_bytes  * 100 

    ElementValue
    {instance="192.168.100.10:20001",job="node"} 9.927579722322251
    {instance="192.168.100.11:20001",job="node"} 59.740727403673034
    {instance="192.168.100.12:20001",job="node"} 63.2080982675149

    获取所有实例的内存使用百分比前2个

    topk(2,node_memory_MemFree_bytes / node_memory_MemTotal_bytes  * 100 )
    ElementValue
    {instance="192.168.100.12:20001",job="node"} 63.20129636298163
    {instance="192.168.100.11:20001",job="node"} 59.50586164125955

    实用查询样例

    获取cpu核心个数

    # 计算所有的实例cpu核心数
    count by (instance) ( count by (instance,cpu) (node_cpu_seconds_total{mode="system"}) )
    # 计算单个实例的
    count by (instance) ( count by (instance,cpu) (node_cpu_seconds_total{mode="system",instance="192.168.100.11:20001"})

    计算内存使用率

    (1 - (node_memory_MemAvailable_bytes{instance=~"192.168.100.10:20001"} / (node_memory_MemTotal_bytes{instance=~"192.168.100.10:20001"})))* 100
    ElementValue
    {instance="192.168.100.10:20001",job="node"} 87.09358620413717
     

    计算根分区使用率

    100 - ((node_filesystem_avail_bytes{instance="192.168.100.10:20001",mountpoint="/",fstype=~"ext4|xfs"} * 100) / node_filesystem_size_bytes {instance=~"192.168.100.10:20001",mountpoint="/",fstype=~"ext4|xfs"})
    ElementValue
    {device="/dev/mapper/centos-root",fstype="xfs",instance="192.168.100.10:20001",job="node",mountpoint="/"} 4.175111443575972

     预测磁盘空间

     # 整体分为 2个部分, 中间用and分割, 前面部分计算根分区使用率大于85的, 后面计算根据近6小时的数据预测接下来24小时的磁盘可用空间是否小于0 。
    (1- node_filesystem_avail_bytes{fstype=~"ext4|xfs",mountpoint="/"} / node_filesystem_size_bytes{fstype=~"ext4|xfs",mountpoint="/"}) * 100 >= 85 and (predict_linear(node_filesystem_avail_bytes[6h],3600 * 24) < 0)
  • 相关阅读:
    机器学习实战第五章Logistic回归
    pyhton pandas库的学习
    pyhton numpy库的学习
    ISLR第8章The Basics of Decision Trees
    ISLR第10章 Unsupervised Learning
    吴恩达机器学习第5周Neural Networks(Cost Function and Backpropagation)
    ISLR第9章SVM
    ISLR第六章Linear Model Selection and Regularization
    ISLR第五章Resampling Methods(重抽样方法)
    ISLR第二章
  • 原文地址:https://www.cnblogs.com/zhaojiedi1992/p/zhaojiedi_liunx_63_prometheus_promql.html
Copyright © 2011-2022 走看看