zoukankan      html  css  js  c++  java
  • Kubernetes实战总结

    一、概述

    首先Prometheus整体监控结构略微复杂,一个个部署并不简单。另外监控Kubernetes就需要访问内部数据,必定需要进行认证、鉴权、准入控制,

    那么这一整套下来将变得难上加难,而且还需要花费一定的时间,如果你没有特别高的要求,我还是建议选用开源比较好的一些方案。

    关于Prometheus具体介绍不再多说,可以参考另外一篇博文:Kubernetes实战总结 - Prometheus部署(v0.3.0)

    本篇主要针对Kubernetes部署Prometheus相关配置介绍,本人采用的是github开源的部署方案:/kube-prometheus

    关于这个kube-prometheus目前应该是开源最好的方案了,该存储库收集Kubernetes清单,Grafana仪表板和Prometheus规则,以及文档和脚本,

    以使用Prometheus Operator 通过Prometheus提供易于操作的端到端Kubernetes集群监视。以容器的方式部署到k8s集群,而且还可以自定义配置,非常的方便。

     

    注意:本人使用的kubernetes-1.17.5  + release-0.3,由于网络问题本人已修改全部镜像地址。

     


     二、结构分析

    kube-prometheus相关部署文件在manifests目录中,共65个yaml,其中setup文件夹中包含所有自定义资源配置CustomResourceDefinition(一般不用修改,也不要轻易修改),所以部署时必须先执行这个文件夹。

    其中包括告警(Alertmanager)、监控(Prometheus)、监控项(PrometheusRule)这三类资源定义,所以如果你想直接在k8s中修改对应控制器配置是没有用的(比如kubectl edit sts prometheus-k8s -n monitoring) 。

    这里yaml文件看着很多,只要我们梳理一下就会很容易理解了,首先分为7个组件prometheus-operator、prometheus-adapter、prometheus、alertmanager、grafana、kube-state-metrics、node-exporter,

    然后每个组件都会定义控制器、配置文件、集群权限、访问配置、监控配置, 但是我们一般只需要进行自定义告警配置和监控项,这样一筛选发现只需要修改几个文件即可(其中红色后面重点说明,紫色可根据项目情况调整资源配置)。

    [root@ymt108 manifests]# tree
    .
    ├── alertmanager-alertmanager.yaml
    ├── alertmanager-secret.yaml    # 告警配置
    ├── alertmanager-serviceAccount.yaml
    ├── alertmanager-serviceMonitor.yaml
    ├── alertmanager-service.yaml
    ├── grafana-dashboardDatasources.yaml
    ├── grafana-dashboardDefinitions.yaml
    ├── grafana-dashboardSources.yaml
    ├── grafana-deployment.yaml
    ├── grafana-serviceAccount.yaml
    ├── grafana-serviceMonitor.yaml
    ├── grafana-service.yaml
    ├── kube-state-metrics-clusterRoleBinding.yaml
    ├── kube-state-metrics-clusterRole.yaml
    ├── kube-state-metrics-deployment.yaml
    ├── kube-state-metrics-roleBinding.yaml
    ├── kube-state-metrics-role.yaml
    ├── kube-state-metrics-serviceAccount.yaml
    ├── kube-state-metrics-serviceMonitor.yaml
    ├── kube-state-metrics-service.yaml
    ├── node-exporter-clusterRoleBinding.yaml
    ├── node-exporter-clusterRole.yaml
    ├── node-exporter-daemonset.yaml
    ├── node-exporter-serviceAccount.yaml
    ├── node-exporter-serviceMonitor.yaml
    ├── node-exporter-service.yaml
    ├── prometheus-adapter-apiService.yaml
    ├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
    ├── prometheus-adapter-clusterRoleBindingDelegator.yaml
    ├── prometheus-adapter-clusterRoleBinding.yaml
    ├── prometheus-adapter-clusterRoleServerResources.yaml
    ├── prometheus-adapter-clusterRole.yaml
    ├── prometheus-adapter-configMap.yaml
    ├── prometheus-adapter-deployment.yaml
    ├── prometheus-adapter-roleBindingAuthReader.yaml
    ├── prometheus-adapter-serviceAccount.yaml
    ├── prometheus-adapter-service.yaml
    ├── prometheus-clusterRoleBinding.yaml
    ├── prometheus-clusterRole.yaml
    ├── prometheus-operator-serviceMonitor.yaml
    ├── prometheus-prometheus.yaml  # 监控配置
    ├── prometheus-roleBindingConfig.yaml
    ├── prometheus-roleBindingSpecificNamespaces.yaml
    ├── prometheus-roleConfig.yaml
    ├── prometheus-roleSpecificNamespaces.yaml
    ├── prometheus-rules.yaml  # 默认监控项
    ├── prometheus-serviceAccount.yaml
    ├── prometheus-serviceMonitorApiserver.yaml
    ├── prometheus-serviceMonitorCoreDNS.yaml
    ├── prometheus-serviceMonitorKubeControllerManager.yaml
    ├── prometheus-serviceMonitorKubelet.yaml
    ├── prometheus-serviceMonitorKubeScheduler.yaml
    ├── prometheus-serviceMonitor.yaml
    ├── prometheus-service.yaml
    └── setup
        ├── 0namespace-namespace.yaml
        ├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
        ├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
        ├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
        ├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
        ├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
        ├── prometheus-operator-clusterRoleBinding.yaml
        ├── prometheus-operator-clusterRole.yaml
        ├── prometheus-operator-deployment.yaml
        ├── prometheus-operator-serviceAccount.yaml
        └── prometheus-operator-service.yaml
    
    1 directories, 65 files

     


    三、修改Prometheus配置

    为了保留原始文件,我们复制一份prometheus-prometheus.yaml进行如下修改:

    1)replicas:根据项目情况调整副本数

    2)retention:修改Prometheus数据保留期限,默认值为“24h”,并且必须与正则表达式“ [0-9] +(ms | s | m | h | d | w | y)”匹配。

    3)additionalScrapeConfigs:增加额外监控项配置,具体配置查看第五部分“添加k8s外部监控”。 

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      labels:
        prometheus: k8s
      name: k8s
      namespace: monitoring
    spec:
      alerting:
        alertmanagers:
        - name: alertmanager-main
          namespace: monitoring
          port: web
      # baseImage: quay.io/prometheus/prometheus
      baseImage: registry.cn-shanghai.aliyuncs.com/leozhanggg/prometheus/prometheus
      additionalScrapeConfigs:
        name: additional-scrape-configs
        key: prometheus-additional.yaml
      retention: 15d
      nodeSelector:
        kubernetes.io/os: linux
      podMonitorSelector: {}
      replicas: 2
      resources:
        requests:
          memory: 400Mi
      ruleSelector:
        matchLabels:
          prometheus: k8s
          role: alert-rules
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccountName: prometheus-k8s
      serviceMonitorNamespaceSelector: {}
      serviceMonitorSelector: {}
      version: v2.11.0

     


    四、修改PrometheusRule配置

    首先查看默认监控项配置prometheus-rules.yaml,其中包括76个告警项,基本覆盖了k8s常用监控点,同样为了保留源文件,我们复制一份prometheus-rules.yaml进行一些修改。

    其中general-rules规则与我自定义规则冲突,被我注释了,最后增加了platform参数区分环境,以及进行部分提示语中译。

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
        prometheus: k8s
        role: alert-rules
      name: prometheus-k8s-rules
      namespace: monitoring
    spec:
      groups:
      - name: node-exporter.rules
        rules:
        - expr: |
            count without (cpu) (
              count without (mode) (
                node_cpu_seconds_total{job="node-exporter"}
              )
            )
          record: instance:node_num_cpu:sum
        - expr: |
            1 - avg without (cpu, mode) (
              rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[1m])
            )
          record: instance:node_cpu_utilisation:rate1m
        - expr: |
            (
              node_load1{job="node-exporter"}
            /
              instance:node_num_cpu:sum{job="node-exporter"}
            )
          record: instance:node_load1_per_cpu:ratio
        - expr: |
            1 - (
              node_memory_MemAvailable_bytes{job="node-exporter"}
            /
              node_memory_MemTotal_bytes{job="node-exporter"}
            )
          record: instance:node_memory_utilisation:ratio
        - expr: |
            rate(node_vmstat_pgmajfault{job="node-exporter"}[1m])
          record: instance:node_vmstat_pgmajfault:rate1m
        - expr: |
            rate(node_disk_io_time_seconds_total{job="node-exporter", device=~"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+"}[1m])
          record: instance_device:node_disk_io_time_seconds:rate1m
        - expr: |
            rate(node_disk_io_time_weighted_seconds_total{job="node-exporter", device=~"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+"}[1m])
          record: instance_device:node_disk_io_time_weighted_seconds:rate1m
        - expr: |
            sum without (device) (
              rate(node_network_receive_bytes_total{job="node-exporter", device!="lo"}[1m])
            )
          record: instance:node_network_receive_bytes_excluding_lo:rate1m
        - expr: |
            sum without (device) (
              rate(node_network_transmit_bytes_total{job="node-exporter", device!="lo"}[1m])
            )
          record: instance:node_network_transmit_bytes_excluding_lo:rate1m
        - expr: |
            sum without (device) (
              rate(node_network_receive_drop_total{job="node-exporter", device!="lo"}[1m])
            )
          record: instance:node_network_receive_drop_excluding_lo:rate1m
        - expr: |
            sum without (device) (
              rate(node_network_transmit_drop_total{job="node-exporter", device!="lo"}[1m])
            )
          record: instance:node_network_transmit_drop_excluding_lo:rate1m
      - name: kube-apiserver.rules
        rules:
        - expr: |
            histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver"}[5m])) without(instance, pod))
          labels:
            quantile: "0.99"
          record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver"}[5m])) without(instance, pod))
          labels:
            quantile: "0.9"
          record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver"}[5m])) without(instance, pod))
          labels:
            quantile: "0.5"
          record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
      - name: k8s.rules
        rules:
        - expr: |
            sum(rate(container_cpu_usage_seconds_total{job="kubelet", image!="", container!="POD"}[5m])) by (namespace)
          record: namespace:container_cpu_usage_seconds_total:sum_rate
        - expr: |
            sum by (namespace, pod, container) (
              rate(container_cpu_usage_seconds_total{job="kubelet", image!="", container!="POD"}[5m])
            ) * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)
          record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
        - expr: |
            container_memory_working_set_bytes{job="kubelet", image!=""}
            * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)
          record: node_namespace_pod_container:container_memory_working_set_bytes
        - expr: |
            container_memory_rss{job="kubelet", image!=""}
            * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)
          record: node_namespace_pod_container:container_memory_rss
        - expr: |
            container_memory_cache{job="kubelet", image!=""}
            * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)
          record: node_namespace_pod_container:container_memory_cache
        - expr: |
            container_memory_swap{job="kubelet", image!=""}
            * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)
          record: node_namespace_pod_container:container_memory_swap
        - expr: |
            sum(container_memory_usage_bytes{job="kubelet", image!="", container!="POD"}) by (namespace)
          record: namespace:container_memory_usage_bytes:sum
        - expr: |
            sum by (namespace, label_name) (
                sum(kube_pod_container_resource_requests_memory_bytes{job="kube-state-metrics"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~"Pending|Running"} == 1)) by (namespace, pod)
              * on (namespace, pod)
                group_left(label_name) kube_pod_labels{job="kube-state-metrics"}
            )
          record: namespace:kube_pod_container_resource_requests_memory_bytes:sum
        - expr: |
            sum by (namespace, label_name) (
                sum(kube_pod_container_resource_requests_cpu_cores{job="kube-state-metrics"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~"Pending|Running"} == 1)) by (namespace, pod)
              * on (namespace, pod)
                group_left(label_name) kube_pod_labels{job="kube-state-metrics"}
            )
          record: namespace:kube_pod_container_resource_requests_cpu_cores:sum
        - expr: |
            sum(
              label_replace(
                label_replace(
                  kube_pod_owner{job="kube-state-metrics", owner_kind="ReplicaSet"},
                  "replicaset", "$1", "owner_name", "(.*)"
                ) * on(replicaset, namespace) group_left(owner_name) kube_replicaset_owner{job="kube-state-metrics"},
                "workload", "$1", "owner_name", "(.*)"
              )
            ) by (namespace, workload, pod)
          labels:
            workload_type: deployment
          record: mixin_pod_workload
        - expr: |
            sum(
              label_replace(
                kube_pod_owner{job="kube-state-metrics", owner_kind="DaemonSet"},
                "workload", "$1", "owner_name", "(.*)"
              )
            ) by (namespace, workload, pod)
          labels:
            workload_type: daemonset
          record: mixin_pod_workload
        - expr: |
            sum(
              label_replace(
                kube_pod_owner{job="kube-state-metrics", owner_kind="StatefulSet"},
                "workload", "$1", "owner_name", "(.*)"
              )
            ) by (namespace, workload, pod)
          labels:
            workload_type: statefulset
          record: mixin_pod_workload
      - name: kube-scheduler.rules
        rules:
        - expr: |
            histogram_quantile(0.99, sum(rate(scheduler_e2e_scheduling_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.99"
          record: cluster_quantile:scheduler_e2e_scheduling_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.99, sum(rate(scheduler_scheduling_algorithm_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.99"
          record: cluster_quantile:scheduler_scheduling_algorithm_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.99, sum(rate(scheduler_binding_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.99"
          record: cluster_quantile:scheduler_binding_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.9, sum(rate(scheduler_e2e_scheduling_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.9"
          record: cluster_quantile:scheduler_e2e_scheduling_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.9, sum(rate(scheduler_scheduling_algorithm_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.9"
          record: cluster_quantile:scheduler_scheduling_algorithm_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.9, sum(rate(scheduler_binding_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.9"
          record: cluster_quantile:scheduler_binding_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.5, sum(rate(scheduler_e2e_scheduling_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.5"
          record: cluster_quantile:scheduler_e2e_scheduling_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.5, sum(rate(scheduler_scheduling_algorithm_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.5"
          record: cluster_quantile:scheduler_scheduling_algorithm_duration_seconds:histogram_quantile
        - expr: |
            histogram_quantile(0.5, sum(rate(scheduler_binding_duration_seconds_bucket{job="kube-scheduler"}[5m])) without(instance, pod))
          labels:
            quantile: "0.5"
          record: cluster_quantile:scheduler_binding_duration_seconds:histogram_quantile
      - name: node.rules
        rules:
        - expr: sum(min(kube_pod_info) by (node))
          record: ':kube_pod_info_node_count:'
        - expr: |
            max(label_replace(kube_pod_info{job="kube-state-metrics"}, "pod", "$1", "pod", "(.*)")) by (node, namespace, pod)
          record: 'node_namespace_pod:kube_pod_info:'
        - expr: |
            count by (node) (sum by (node, cpu) (
              node_cpu_seconds_total{job="node-exporter"}
            * on (namespace, pod) group_left(node)
              node_namespace_pod:kube_pod_info:
            ))
          record: node:node_num_cpu:sum
        - expr: |
            sum(
              node_memory_MemAvailable_bytes{job="node-exporter"} or
              (
                node_memory_Buffers_bytes{job="node-exporter"} +
                node_memory_Cached_bytes{job="node-exporter"} +
                node_memory_MemFree_bytes{job="node-exporter"} +
                node_memory_Slab_bytes{job="node-exporter"}
              )
            )
          record: :node_memory_MemAvailable_bytes:sum
      - name: kube-prometheus-node-recording.rules
        rules:
        - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait"}[3m])) BY
            (instance)
          record: instance:node_cpu:rate:sum
        - expr: sum((node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_free_bytes{mountpoint="/"}))
            BY (instance)
          record: instance:node_filesystem_usage:sum
        - expr: sum(rate(node_network_receive_bytes_total[3m])) BY (instance)
          record: instance:node_network_receive_bytes:rate:sum
        - expr: sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)
          record: instance:node_network_transmit_bytes:rate:sum
        - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait"}[5m])) WITHOUT
            (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total)
            BY (instance, cpu)) BY (instance)
          record: instance:node_cpu:ratio
        - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait"}[5m]))
          record: cluster:node_cpu:sum_rate5m
        - expr: cluster:node_cpu_seconds_total:rate5m / count(sum(node_cpu_seconds_total)
            BY (instance, cpu))
          record: cluster:node_cpu:ratio
      - name: node-exporter
        rules:
        - alert: NodeFilesystemSpaceFillingUp
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available space left and is filling
              up.
            summary: "预计文件系统将在接下来的24小时内用完空间。"
          expr: |
            (
              node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 40
            and
              predict_linear(node_filesystem_avail_bytes{job="node-exporter",fstype!=""}[6h], 24*60*60) < 0
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: warning
        - alert: NodeFilesystemSpaceFillingUp
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available space left and is filling
              up fast.
            summary: "预计文件系统将在接下来的4个小时内用完空间。"
          expr: |
            (
              node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 20
            and
              predict_linear(node_filesystem_avail_bytes{job="node-exporter",fstype!=""}[6h], 4*60*60) < 0
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: critical
        - alert: NodeFilesystemAlmostOutOfSpace
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available space left.
            summary: "文件系统剩余空间不到5%。"
          expr: |
            (
              node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 5
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: warning
        - alert: NodeFilesystemAlmostOutOfSpace
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available space left.
            summary: "文件系统剩余空间不到3%。"
          expr: |
            (
              node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 3
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: critical
        - alert: NodeFilesystemFilesFillingUp
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available inodes left and is filling
              up.
            summary: "预计文件系统将在接下来的24小时内用尽inodes。"
          expr: |
            (
              node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 40
            and
              predict_linear(node_filesystem_files_free{job="node-exporter",fstype!=""}[6h], 24*60*60) < 0
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: warning
        - alert: NodeFilesystemFilesFillingUp
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available inodes left and is filling
              up fast.
            summary: "预计文件系统将在接下来的4小时内用尽inodes。"
          expr: |
            (
              node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 20
            and
              predict_linear(node_filesystem_files_free{job="node-exporter",fstype!=""}[6h], 4*60*60) < 0
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: critical
        - alert: NodeFilesystemAlmostOutOfFiles
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available inodes left.
            summary: "文件系统仅剩不到5%的inodes。"
          expr: |
            (
              node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 5
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: warning
        - alert: NodeFilesystemAlmostOutOfFiles
          annotations:
            platform: "育苗通测试平台"
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available inodes left.
            summary: "文件系统仅剩不到3%的inodes。"
          expr: |
            (
              node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 3
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: critical
        - alert: NodeNetworkReceiveErrs
          annotations:
            platform: "育苗通测试平台"
            description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered
              {{ printf "%.0f" $value }} receive errors in the last two minutes.'
            summary: "网络接口报告许多接收错误。"
          expr: |
            increase(node_network_receive_errs_total[2m]) > 10
          for: 1h
          labels:
            severity: warning
        - alert: NodeNetworkTransmitErrs
          annotations:
            platform: "育苗通测试平台"
            description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered
              {{ printf "%.0f" $value }} transmit errors in the last two minutes.'
            summary: "网络接口报告许多传输错误。"
          expr: |
            increase(node_network_transmit_errs_total[2m]) > 10
          for: 1h
          labels:
            severity: warning
      - name: kubernetes-apps
        rules:
        - alert: KubePodCrashLooping
          annotations:
            platform: "育苗通测试平台"
            message: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container
              }}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.
          expr: |
            rate(kube_pod_container_status_restarts_total{job="kube-state-metrics"}[15m]) * 60 * 5 > 0
          for: 15m
          labels:
            severity: critical
        - alert: KubePodNotReady
          annotations:
            platform: "育苗通测试平台"
            message: "Pod {{$labels.namespace}}/{{$labels.pod}}处于未就绪状态的时间超过15分钟。"
          expr: |
            sum by (namespace, pod) (kube_pod_status_phase{job="kube-state-metrics", phase=~"Failed|Pending|Unknown"} * on(namespace, pod) group_left(owner_kind) kube_pod_owner{owner_kind!="Job"}) > 0
          for: 15m
          labels:
            severity: critical
        - alert: KubeDeploymentGenerationMismatch
          annotations:
            platform: "育苗通测试平台"
            message: "Deployment {{$labels.namespace}}/{{$labels.deployment}}生成不匹配,这表明Deployment已失败但尚未回滚。"
          expr: |
            kube_deployment_status_observed_generation{job="kube-state-metrics"}
              !=
            kube_deployment_metadata_generation{job="kube-state-metrics"}
          for: 15m
          labels:
            severity: critical
        - alert: KubeDeploymentReplicasMismatch
          annotations:
            platform: "育苗通测试平台"
            message: "Deployment {{$labels.namespace}}/{{$labels.deployment}}超过15分钟未匹配预期的副本数。"
          expr: |
            kube_deployment_spec_replicas{job="kube-state-metrics"}
              !=
            kube_deployment_status_replicas_available{job="kube-state-metrics"}
          for: 15m
          labels:
            severity: critical
        - alert: KubeStatefulSetReplicasMismatch
          annotations:
            platform: "育苗通测试平台"
            message: "StatefulSet {{$labels.namespace}}/{{$labels.statefulset}}超过15分钟未匹配预期的副本数。"
          expr: |
            kube_statefulset_status_replicas_ready{job="kube-state-metrics"}
              !=
            kube_statefulset_status_replicas{job="kube-state-metrics"}
          for: 15m
          labels:
            severity: critical
        - alert: KubeStatefulSetGenerationMismatch
          annotations:
            platform: "育苗通测试平台"
            message: "StatefulSet {{$labels.namespace}}/{{$labels.statefulset}}生成不匹配,这表明StatefulSet已失败但尚未回滚。"
          expr: |
            kube_statefulset_status_observed_generation{job="kube-state-metrics"}
              !=
            kube_statefulset_metadata_generation{job="kube-state-metrics"}
          for: 15m
          labels:
            severity: critical
        - alert: KubeStatefulSetUpdateNotRolledOut
          annotations:
            platform: "育苗通测试平台"
            message: StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update
              has not been rolled out.
          expr: |
            max without (revision) (
              kube_statefulset_status_current_revision{job="kube-state-metrics"}
                unless
              kube_statefulset_status_update_revision{job="kube-state-metrics"}
            )
              *
            (
              kube_statefulset_replicas{job="kube-state-metrics"}
                !=
              kube_statefulset_status_replicas_updated{job="kube-state-metrics"}
            )
          for: 15m
          labels:
            severity: critical
        - alert: KubeDaemonSetRolloutStuck
          annotations:
            platform: "育苗通测试平台"
            message: Only {{ $value | humanizePercentage }} of the desired Pods of DaemonSet
              {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.
          expr: |
            kube_daemonset_status_number_ready{job="kube-state-metrics"}
              /
            kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics"} < 1.00
          for: 15m
          labels:
            severity: critical
        - alert: KubeContainerWaiting
          annotations:
            platform: "育苗通测试平台"
            message: Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container}}
              has been in waiting state for longer than 1 hour.
          expr: |
            sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job="kube-state-metrics"}) > 0
          for: 1h
          labels:
            severity: warning
        - alert: KubeDaemonSetNotScheduled
          annotations:
            platform: "育苗通测试平台"
            message: '{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset
              }} are not scheduled.'
          expr: |
            kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics"}
              -
            kube_daemonset_status_current_number_scheduled{job="kube-state-metrics"} > 0
          for: 10m
          labels:
            severity: warning
        - alert: KubeDaemonSetMisScheduled
          annotations:
            platform: "育苗通测试平台"
            message: '{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset
              }} are running where they are not supposed to run.'
          expr: |
            kube_daemonset_status_number_misscheduled{job="kube-state-metrics"} > 0
          for: 10m
          labels:
            severity: warning
        - alert: KubeCronJobRunning
          annotations:
            platform: "育苗通测试平台"
            message: CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more
              than 1h to complete.
          expr: |
            time() - kube_cronjob_next_schedule_time{job="kube-state-metrics"} > 3600
          for: 1h
          labels:
            severity: warning
        - alert: KubeJobCompletion
          annotations:
            platform: "育苗通测试平台"
            message: Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more
              than one hour to complete.
          expr: |
            kube_job_spec_completions{job="kube-state-metrics"} - kube_job_status_succeeded{job="kube-state-metrics"}  > 0
          for: 1h
          labels:
            severity: warning
        - alert: KubeJobFailed
          annotations:
            platform: "育苗通测试平台"
            message: Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.
          expr: |
            kube_job_failed{job="kube-state-metrics"}  > 0
          for: 15m
          labels:
            severity: warning
        - alert: KubeHpaReplicasMismatch
          annotations:
            platform: "育苗通测试平台"
            message: HPA {{ $labels.namespace }}/{{ $labels.hpa }} has not matched the
              desired number of replicas for longer than 15 minutes.
          expr: |
            (kube_hpa_status_desired_replicas{job="kube-state-metrics"}
              !=
            kube_hpa_status_current_replicas{job="kube-state-metrics"})
              and
            changes(kube_hpa_status_current_replicas[15m]) == 0
          for: 15m
          labels:
            severity: warning
        - alert: KubeHpaMaxedOut
          annotations:
            platform: "育苗通测试平台"
            message: HPA {{ $labels.namespace }}/{{ $labels.hpa }} has been running at
              max replicas for longer than 15 minutes.
          expr: |
            kube_hpa_status_current_replicas{job="kube-state-metrics"}
              ==
            kube_hpa_spec_max_replicas{job="kube-state-metrics"}
          for: 15m
          labels:
            severity: warning
      - name: kubernetes-resources
        rules:
        - alert: KubeCPUOvercommit
          annotations:
            platform: "育苗通测试平台"
            message: "集群已超额使用Pod的CPU资源请求,因此无法容忍节点故障。"
          expr: |
            sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum)
              /
            sum(kube_node_status_allocatable_cpu_cores)
              >
            (count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)
          for: 5m
          labels:
            severity: warning
        - alert: KubeMemOvercommit
          annotations:
            platform: "育苗通测试平台"
            message: "集群已过量使用Pod的内存资源请求,因此无法容忍节点故障。"
          expr: |
            sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum)
              /
            sum(kube_node_status_allocatable_memory_bytes)
              >
            (count(kube_node_status_allocatable_memory_bytes)-1)
              /
            count(kube_node_status_allocatable_memory_bytes)
          for: 5m
          labels:
            severity: warning
        - alert: KubeCPUOvercommit
          annotations:
            platform: "育苗通测试平台"
            message: "集群已超额使用了对命名空间的CPU资源请求。"
          expr: |
            sum(kube_resourcequota{job="kube-state-metrics", type="hard", resource="cpu"})
              /
            sum(kube_node_status_allocatable_cpu_cores)
              > 1.5
          for: 5m
          labels:
            severity: warning
        - alert: KubeMemOvercommit
          annotations:
            platform: "育苗通测试平台"
            message: "集群已过量使用了对命名空间的内存资源请求。"
          expr: |
            sum(kube_resourcequota{job="kube-state-metrics", type="hard", resource="memory"})
              /
            sum(kube_node_status_allocatable_memory_bytes{job="node-exporter"})
              > 1.5
          for: 5m
          labels:
            severity: warning
        - alert: KubeQuotaExceeded
          annotations:
            platform: "育苗通测试平台"
            message: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage
              }} of its {{ $labels.resource }} quota.
          expr: |
            kube_resourcequota{job="kube-state-metrics", type="used"}
              / ignoring(instance, job, type)
            (kube_resourcequota{job="kube-state-metrics", type="hard"} > 0)
              > 0.90
          for: 15m
          labels:
            severity: warning
        - alert: CPUThrottlingHigh
          annotations:
            message: '{{ $value | humanizePercentage }} throttling of CPU in namespace
              {{ $labels.namespace }} for container {{ $labels.container }} in pod {{
              $labels.pod }}.'
          expr: |
            sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (container, pod, namespace)
              /
            sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)
              > ( 25 / 100 )
          for: 15m
          labels:
            severity: warning
      - name: kubernetes-storage
        rules:
        - alert: KubePersistentVolumeUsageCritical
          annotations:
            platform: "育苗通测试平台"
            message: The PersistentVolume claimed by {{ $labels.persistentvolumeclaim
              }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage
              }} free.
          expr: |
            kubelet_volume_stats_available_bytes{job="kubelet"}
              /
            kubelet_volume_stats_capacity_bytes{job="kubelet"}
              < 0.03
          for: 1m
          labels:
            severity: critical
        - alert: KubePersistentVolumeFullInFourDays
          annotations:
            platform: "育苗通测试平台"
            message: "根据最近的抽样,{{$labels.persistentvolumeclaim}}在命名空间{{$labels.namespace}}中声明的PersistentVolume预计将在四天内填满,目前{{$value | humanizePercentage}}可用。"
          expr: |
            (
              kubelet_volume_stats_available_bytes{job="kubelet"}
                /
              kubelet_volume_stats_capacity_bytes{job="kubelet"}
            ) < 0.15
            and
            predict_linear(kubelet_volume_stats_available_bytes{job="kubelet"}[6h], 4 * 24 * 3600) < 0
          for: 1h
          labels:
            severity: critical
        - alert: KubePersistentVolumeErrors
          annotations:
            platform: "育苗通测试平台"
            message: The persistent volume {{ $labels.persistentvolume }} has status {{
              $labels.phase }}.
          expr: |
            kube_persistentvolume_status_phase{phase=~"Failed|Pending",job="kube-state-metrics"} > 0
          for: 5m
          labels:
            severity: critical
      - name: kubernetes-system
        rules:
        - alert: KubeVersionMismatch
          annotations:
            platform: "育苗通测试平台"
            message: There are {{ $value }} different semantic versions of Kubernetes
              components running.
          expr: |
            count(count by (gitVersion) (label_replace(kubernetes_build_info{job!~"kube-dns|coredns"},"gitVersion","$1","gitVersion","(v[0-9]*.[0-9]*.[0-9]*).*"))) > 1
          for: 15m
          labels:
            severity: warning
        - alert: KubeClientErrors
          annotations:
            platform: "育苗通测试平台"
            message: Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance
              }}' is experiencing {{ $value | humanizePercentage }} errors.'
          expr: |
            (sum(rate(rest_client_requests_total{code=~"5.."}[5m])) by (instance, job)
              /
            sum(rate(rest_client_requests_total[5m])) by (instance, job))
            > 0.01
          for: 15m
          labels:
            severity: warning
      - name: kubernetes-system-apiserver
        rules:
        - alert: KubeAPILatencyHigh
          annotations:
            platform: "育苗通测试平台"
            message: The API server has a 99th percentile latency of {{ $value }} seconds
              for {{ $labels.verb }} {{ $labels.resource }}.
          expr: |
            cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job="apiserver",quantile="0.99",subresource!="log",verb!~"LIST|WATCH|WATCHLIST|PROXY|CONNECT"} > 1
          for: 10m
          labels:
            severity: warning
        - alert: KubeAPILatencyHigh
          annotations:
            platform: "育苗通测试平台"
            message: The API server has a 99th percentile latency of {{ $value }} seconds
              for {{ $labels.verb }} {{ $labels.resource }}.
          expr: |
            cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job="apiserver",quantile="0.99",subresource!="log",verb!~"LIST|WATCH|WATCHLIST|PROXY|CONNECT"} > 4
          for: 10m
          labels:
            severity: critical
        - alert: KubeAPIErrorsHigh
          annotations:
            platform: "育苗通测试平台"
            message: API server is returning errors for {{ $value | humanizePercentage
              }} of requests.
          expr: |
            sum(rate(apiserver_request_total{job="apiserver",code=~"5.."}[5m]))
              /
            sum(rate(apiserver_request_total{job="apiserver"}[5m])) > 0.03
          for: 10m
          labels:
            severity: critical
        - alert: KubeAPIErrorsHigh
          annotations:
            platform: "育苗通测试平台"
            message: API server is returning errors for {{ $value | humanizePercentage
              }} of requests.
          expr: |
            sum(rate(apiserver_request_total{job="apiserver",code=~"5.."}[5m]))
              /
            sum(rate(apiserver_request_total{job="apiserver"}[5m])) > 0.01
          for: 10m
          labels:
            severity: warning
        - alert: KubeAPIErrorsHigh
          annotations:
            platform: "育苗通测试平台"
            message: API server is returning errors for {{ $value | humanizePercentage
              }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource
              }}.
          expr: |
            sum(rate(apiserver_request_total{job="apiserver",code=~"5.."}[5m])) by (resource,subresource,verb)
              /
            sum(rate(apiserver_request_total{job="apiserver"}[5m])) by (resource,subresource,verb) > 0.10
          for: 10m
          labels:
            severity: critical
        - alert: KubeAPIErrorsHigh
          annotations:
            platform: "育苗通测试平台"
            message: API server is returning errors for {{ $value | humanizePercentage
              }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource
              }}.
          expr: |
            sum(rate(apiserver_request_total{job="apiserver",code=~"5.."}[5m])) by (resource,subresource,verb)
              /
            sum(rate(apiserver_request_total{job="apiserver"}[5m])) by (resource,subresource,verb) > 0.05
          for: 10m
          labels:
            severity: warning
        - alert: KubeClientCertificateExpiration
          annotations:
            platform: "育苗通测试平台"
            message: "用于验证apiserver的客户端证书的有效期限少于7天。"
          expr: |
            apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 604800
          labels:
            severity: warning
        - alert: KubeClientCertificateExpiration
          annotations:
            platform: "育苗通测试平台"
            message: "用于验证apiserver的客户端证书的有效期限少于24小时。"
          expr: |
            apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 86400
          labels:
            severity: critical
        - alert: KubeAPIDown
          annotations:
            platform: "育苗通测试平台"
            message: "KubeAPI已从Prometheus目标发现中消失。"
          expr: |
            absent(up{job="apiserver"} == 1)
          for: 15m
          labels:
            severity: critical
      - name: kubernetes-system-kubelet
        rules:
        - alert: KubeNodeNotReady
          annotations:
            platform: "育苗通测试平台"
            message: "{{$labels.node}}尚未准备就绪超过15分钟。"
          expr: |
            kube_node_status_condition{job="kube-state-metrics",condition="Ready",status="true"} == 0
          for: 15m
          labels:
            severity: warning
        - alert: KubeNodeUnreachable
          annotations:
            platform: "育苗通测试平台"
            message: "{{$labels.node}}无法访问,某些工作负荷可能会重新安排。"
          expr: |
            kube_node_spec_taint{job="kube-state-metrics",key="node.kubernetes.io/unreachable",effect="NoSchedule"} == 1
          labels:
            severity: warning
        - alert: KubeletTooManyPods
          annotations:
            platform: "育苗通测试平台"
            message: Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage
              }} of its Pod capacity.
          expr: |
            max(max(kubelet_running_pod_count{job="kubelet"}) by(instance) * on(instance) group_left(node) kubelet_node_name{job="kubelet"}) by(node) / max(kube_node_status_capacity_pods{job="kube-state-metrics"}) by(node) > 0.95
          for: 15m
          labels:
            severity: warning
        - alert: KubeletDown
          annotations:
            platform: "育苗通测试平台"
            message: "Kubelet已从Prometheus目标发现中消失。"
          expr: |
            absent(up{job="kubelet"} == 1)
          for: 15m
          labels:
            severity: critical
      - name: kubernetes-system-scheduler
        rules:
        - alert: KubeSchedulerDown
          annotations:
            message: "KubeScheduler已从Prometheus目标发现中消失。"
          expr: |
            absent(up{job="kube-scheduler"} == 1)
          for: 15m
          labels:
            severity: critical
      - name: kubernetes-system-controller-manager
        rules:
        - alert: KubeControllerManagerDown
          annotations:
            message: "KubeControllerManager已从Prometheus目标发现中消失。"
          expr: |
            absent(up{job="kube-controller-manager"} == 1)
          for: 15m
          labels:
            severity: critical
      - name: prometheus
        rules:
        - alert: PrometheusBadConfig
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has failed to
              reload its configuration.
            summary: "Prometheus配置重新加载失败。"
          expr: |
            # Without max_over_time, failed scrapes could create false negatives, see
            # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
            max_over_time(prometheus_config_last_reload_successful{job="prometheus-k8s",namespace="monitoring"}[5m]) == 0
          for: 10m
          labels:
            severity: critical
        - alert: PrometheusNotificationQueueRunningFull
          annotations:
            platform: "育苗通测试平台"
            description: Alert notification queue of Prometheus {{$labels.namespace}}/{{$labels.pod}}
              is running full.
            summary: "Prometheus警报通知队列预计将在30m以内用完。"
          expr: |
            # Without min_over_time, failed scrapes could create false negatives, see
            # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
            (
              predict_linear(prometheus_notifications_queue_length{job="prometheus-k8s",namespace="monitoring"}[5m], 60 * 30)
            >
              min_over_time(prometheus_notifications_queue_capacity{job="prometheus-k8s",namespace="monitoring"}[5m])
            )
          for: 15m
          labels:
            severity: warning
        - alert: PrometheusErrorSendingAlertsToSomeAlertmanagers
          annotations:
            platform: "育苗通测试平台"
            description: '{{ printf "%.1f" $value }}% errors while sending alerts from
              Prometheus {{$labels.namespace}}/{{$labels.pod}} to Alertmanager {{$labels.alertmanager}}.'
            summary: "Prometheus在将警报发送到特定的Alertmanager时遇到了超过1%的错误。"
          expr: |
            (
              rate(prometheus_notifications_errors_total{job="prometheus-k8s",namespace="monitoring"}[5m])
            /
              rate(prometheus_notifications_sent_total{job="prometheus-k8s",namespace="monitoring"}[5m])
            )
            * 100
            > 1
          for: 15m
          labels:
            severity: warning
        - alert: PrometheusErrorSendingAlertsToAnyAlertmanager
          annotations:
            platform: "育苗通测试平台"
            description: '{{ printf "%.1f" $value }}% minimum errors while sending alerts
              from Prometheus {{$labels.namespace}}/{{$labels.pod}} to any Alertmanager.'
            summary: "Prometheus在将警报发送到任何Alertmanager时遇到3%以上的错误。"
          expr: |
            min without(alertmanager) (
              rate(prometheus_notifications_errors_total{job="prometheus-k8s",namespace="monitoring"}[5m])
            /
              rate(prometheus_notifications_sent_total{job="prometheus-k8s",namespace="monitoring"}[5m])
            )
            * 100
            > 3
          for: 15m
          labels:
            severity: critical
        - alert: PrometheusNotConnectedToAlertmanagers
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is not connected
              to any Alertmanagers.
            summary: "Prometheus未与任何Alertmanager连接。"
          expr: |
            # Without max_over_time, failed scrapes could create false negatives, see
            # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
            max_over_time(prometheus_notifications_alertmanagers_discovered{job="prometheus-k8s",namespace="monitoring"}[5m]) < 1
          for: 10m
          labels:
            severity: warning
        - alert: PrometheusTSDBReloadsFailing
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has detected
              {{$value | humanize}} reload failures over the last 3h.
            summary: "Prometheus从磁盘重新加载块时遇到问题。"
          expr: |
            increase(prometheus_tsdb_reloads_failures_total{job="prometheus-k8s",namespace="monitoring"}[3h]) > 0
          for: 4h
          labels:
            severity: warning
        - alert: PrometheusTSDBCompactionsFailing
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has detected
              {{$value | humanize}} compaction failures over the last 3h.
            summary: "Prometheus在压缩块时遇到问题。"
          expr: |
            increase(prometheus_tsdb_compactions_failed_total{job="prometheus-k8s",namespace="monitoring"}[3h]) > 0
          for: 4h
          labels:
            severity: warning
        - alert: PrometheusNotIngestingSamples
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is not ingesting
              samples.
            summary: "Prometheus没有获取到样本"
          expr: |
            rate(prometheus_tsdb_head_samples_appended_total{job="prometheus-k8s",namespace="monitoring"}[5m]) <= 0
          for: 10m
          labels:
            severity: warning
        - alert: PrometheusDuplicateTimestamps
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is dropping
              {{ printf "%.4g" $value  }} samples/s with different values but duplicated
              timestamp.
            summary: "Prometheus正在删除带有重复时间戳的样本。"
          expr: |
            rate(prometheus_target_scrapes_sample_duplicate_timestamp_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0
          for: 10m
          labels:
            severity: warning
        - alert: PrometheusOutOfOrderTimestamps
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is dropping
              {{ printf "%.4g" $value  }} samples/s with timestamps arriving out of order.
            summary: "Prometheus丢弃带有乱序时间戳的样本。"
          expr: |
            rate(prometheus_target_scrapes_sample_out_of_order_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0
          for: 10m
          labels:
            severity: warning
        - alert: PrometheusRemoteStorageFailures
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} failed to send
              {{ printf "%.1f" $value }}% of the samples to queue {{$labels.queue}}.
            summary: "Prometheus无法将样本发送到远程存储。"
          expr: |
            (
              rate(prometheus_remote_storage_failed_samples_total{job="prometheus-k8s",namespace="monitoring"}[5m])
            /
              (
                rate(prometheus_remote_storage_failed_samples_total{job="prometheus-k8s",namespace="monitoring"}[5m])
              +
                rate(prometheus_remote_storage_succeeded_samples_total{job="prometheus-k8s",namespace="monitoring"}[5m])
              )
            )
            * 100
            > 1
          for: 15m
          labels:
            severity: critical
        - alert: PrometheusRemoteWriteBehind
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote write
              is {{ printf "%.1f" $value }}s behind for queue {{$labels.queue}}.
            summary: "Prometheus远程写入落后了。"
          expr: |
            # Without max_over_time, failed scrapes could create false negatives, see
            # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
            (
              max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job="prometheus-k8s",namespace="monitoring"}[5m])
            - on(job, instance) group_right
              max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job="prometheus-k8s",namespace="monitoring"}[5m])
            )
            > 120
          for: 15m
          labels:
            severity: critical
        - alert: PrometheusRemoteWriteDesiredShards
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote write
              desired shards calculation wants to run {{ $value }} shards, which is more
              than the max of {{ printf `prometheus_remote_storage_shards_max{instance="%s",job="prometheus-k8s",namespace="monitoring"}`
              $labels.instance | query | first | value }}.
            summary: "Prometheus远程写入所需的分片计算要比配置的最大分片运行更多。"
          expr: |
            # Without max_over_time, failed scrapes could create false negatives, see
            # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
            (
              max_over_time(prometheus_remote_storage_shards_desired{job="prometheus-k8s",namespace="monitoring"}[5m])
            >
              max_over_time(prometheus_remote_storage_shards_max{job="prometheus-k8s",namespace="monitoring"}[5m])
            )
          for: 15m
          labels:
            severity: warning
        - alert: PrometheusRuleFailures
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has failed to
              evaluate {{ printf "%.0f" $value }} rules in the last 5m.
            summary: "Prometheus无法通过规则评估。"
          expr: |
            increase(prometheus_rule_evaluation_failures_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0
          for: 15m
          labels:
            severity: critical
        - alert: PrometheusMissingRuleEvaluations
          annotations:
            platform: "育苗通测试平台"
            description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has missed {{
              printf "%.0f" $value }} rule group evaluations in the last 5m.
            summary: "Prometheus由于规则组评估速度慢而缺少规则评估。"
          expr: |
            increase(prometheus_rule_group_iterations_missed_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0
          for: 15m
          labels:
            severity: warning
      - name: alertmanager.rules
        rules:
        - alert: AlertmanagerConfigInconsistent
          annotations:
            platform: "育苗通测试平台"
            message: "Alertmanager {{$labels.service}}实例的配置不同步。"
          expr: |
            count_values("config_hash", alertmanager_config_hash{job="alertmanager-main",namespace="monitoring"}) BY (service) / ON(service) GROUP_LEFT() label_replace(max(prometheus_operator_spec_replicas{job="prometheus-operator",namespace="monitoring",controller="alertmanager"}) by (name, job, namespace, controller), "service", "alertmanager-$1", "name", "(.*)") != 1
          for: 5m
          labels:
            severity: critical
        - alert: AlertmanagerFailedReload
          annotations:
            platform: "育苗通测试平台"
            message: "Alertmanager {{$labels.namespace}}/{{$labels.pod}}重新加载配置失败。"
          expr: |
            alertmanager_config_last_reload_successful{job="alertmanager-main",namespace="monitoring"} == 0
          for: 10m
          labels:
            severity: warning
        - alert: AlertmanagerMembersInconsistent
          annotations:
            platform: "育苗通测试平台"
            message: "Alertmanager尚未找到集群的所有其他成员。"
          expr: |
            alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"}
              != on (service) GROUP_LEFT()
            count by (service) (alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"})
          for: 5m
          labels:
            severity: critical
      # - name: general.rules
        # rules:
        # - alert: TargetDown
          # annotations:
            # platform: "育苗通测试平台"
            # message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }} targets in
              # {{ $labels.namespace }} namespace are down.'
            # runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md
          # expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job,
            # namespace, service)) > 10
          # for: 10m
          # labels:
            # severity: warning
        # - alert: Watchdog
          # annotations:
            # platform: "育苗通测试平台"
            # message: "此警报始终处于触发状态,旨在确保整个警报管道均正常运行。"
          # expr: vector(1)
          # labels:
            # severity: none
      - name: node-time
        rules:
        - alert: ClockSkewDetected
          annotations:
            platform: "育苗通测试平台"
            message: Clock skew detected on node-exporter {{ $labels.namespace }}/{{ $labels.pod
              }}. Ensure NTP is configured correctly on this host.
          expr: |
            abs(node_timex_offset_seconds{job="node-exporter"}) > 0.05
          for: 2m
          labels:
            severity: warning
      - name: node-network
        rules:
        - alert: NodeNetworkInterfaceFlapping
          annotations:
            platform: "育苗通测试平台"
            message: Network interface "{{ $labels.device }}" changing it's up status
              often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}"
          expr: |
            changes(node_network_up{job="node-exporter",device!~"veth.+"}[2m]) > 2
          for: 2m
          labels:
            severity: warning
      - name: prometheus-operator
        rules:
        - alert: PrometheusOperatorReconcileErrors
          annotations:
            platform: "育苗通测试平台"
            message: Errors while reconciling {{ $labels.controller }} in {{ $labels.namespace
              }} Namespace.
          expr: |
            rate(prometheus_operator_reconcile_errors_total{job="prometheus-operator",namespace="monitoring"}[5m]) > 0.1
          for: 10m
          labels:
            severity: warning
        - alert: PrometheusOperatorNodeLookupErrors
          annotations:
            platform: "育苗通测试平台"
            message: Errors while reconciling Prometheus in {{ $labels.namespace }} Namespace.
          expr: |
            rate(prometheus_operator_node_address_lookup_errors_total{job="prometheus-operator",namespace="monitoring"}[5m]) > 0.1
          for: 10m
          labels:
            severity: warning
    prometheus-rules.yaml

    接下来参考prometheus-rules.yaml,新建自定义的告警项prometheus-additional-rules.yaml

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
        prometheus: k8s
        role: alert-rules
      name: prometheus-additional-rules
      namespace: monitoring
    spec:
      groups:
        - name: general.rules
          rules:
          - alert: InstanceDown
            expr: up == 0
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.instance}} 采集器已停止工作"
              description: "{{$labels.instance}} 服务器延时超过5分钟"
              
          - alert: NodeCPUUsage
            expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance) * 100) > 80
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.mountpoint}} CPU使用率过高!"
              description: "{{$labels.mountpoint }} CPU使用大于80%(目前使用:{{$value}}%)"
      
          - alert: NodeMemoryUsage
            expr: 100 - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100 > 80
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.mountpoint}} 内存使用率过高!"
              description: "{{$labels.mountpoint }} 内存使用大于80%(目前使用:{{$value}}%)"
              
          - alert: NodeFilesystemUsage
            expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.mountpoint}} 磁盘分区使用率过高!"
              description: "{{$labels.mountpoint }} 磁盘分区使用大于80%(目前使用:{{$value}}%)"
          
          - alert: NodeDiskIOUsage
            expr: (avg(irate(node_disk_io_time_seconds_total[1m])) by(instance) * 100) > 80
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.mountpoint}} 流入磁盘IO使用率过高!"
              description: "{{$labels.mountpoint }} 流入磁盘IO大于80%(目前使用:{{$value}})"
              
          - alert: NodeNetworkReceive
            expr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 1048576
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.mountpoint}} 流入网络带宽过高!"
              description: "{{$labels.mountpoint }}流入网络带宽持续5分钟高于1G. RX带宽使用率{{$value}}"
     
          - alert: NodeNetworkTransmit
            expr: ((sum(rate (node_network_transmit_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 1048576
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.mountpoint}} 流出网络带宽过高!"
              description: "{{$labels.mountpoint }}流出网络带宽持续5分钟高于1G. RX带宽使用率{{$value}}"
          
          - alert: NodeTCPCurrEstab
            expr: node_netstat_Tcp_CurrEstab > 1000
            for: 1m
            labels:
              status: critical
            annotations:
              platform: "育苗通测试平台"
              summary: "{{$labels.mountpoint}} TCP_ESTABLISHED过高!"
              description: "{{$labels.mountpoint }} TCP_ESTABLISHED大于1000(目前使用:{{$value}}%)"
              

     


    五、添加k8s外部监控

    一个项目开始可能很难实现全部容器化,比如数据库、CDH集群。但是我们依然需要监控他们,如果分成两套prometheus不利于管理,所以我们统一添加这些监控到kube-prometheus中。

    那么接下来我们新建prometheus-additional.yaml文件,添加额外监控组件配置scrape_configs。

    - job_name: 'node-exporter-others'
      static_configs:
        - targets:
          - *.*.*.149:31190
          - *.*.*.150:31190
          - *.*.*.122:31190
    
    - job_name: 'mysql-exporter'
      static_configs:
        - targets:
          - *.*.*.104:9592
          - *.*.*.125:9592
          - *.*.*.128:9592
    
    - job_name: 'nacos-exporter'
      metrics_path: '/nacos/actuator/prometheus'
      static_configs:
        - targets:
          - *.*.*.113:8848
          - *.*.*.114:8848
          - *.*.*.118:8848
    
    - job_name: 'elasticsearch-exporter'
      static_configs:
      - targets:
        - *.*.*.110:9597
        - *.*.*.107:9597
        - *.*.*.117:9597
    
    - job_name: 'zookeeper-exporter'
      static_configs:
      - targets:
        - *.*.*.115:9595
        - *.*.*.121:9595
        - *.*.*.120:9595
    
    - job_name: 'nginx-exporter'
      static_configs:
      - targets:
        - *.*.*.149:9593
        - *.*.*.150:9593
        - *.*.*.122:9593
    
    - job_name: 'redis-exporter'
      static_configs:
      - targets:
        - *.*.*.109:9594
    
    - job_name: 'redis-exporter-targets'
      static_configs:
        - targets:
          - redis://*.*.*.146:7090
          - redis://*.*.*.144:7090
          - redis://*.*.*.133:7091
      metrics_path: /scrape
      relabel_configs:
        - source_labels: [__address__]
          target_label: __param_target
        - source_labels: [__param_target]
          target_label: instance
        - target_label: __address__
          replacement: *.*.*.109:9594
    prometheus-additional.yaml

     然后我们需要将这些监控配置以secret资源类型存储到k8s集群中。

    kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring

     

     


    六、增加k8s service

    展开Status菜单,查看targets,可以看到只有图中两个监控任务没有对应的目标,这和serviceMonitor资源对象有关。

    查看yaml文件prometheus-serviceMonitorKubeScheduler,selector匹配的是service的标签,但是kube-system namespace中并没有k8s-app=kube-scheduler的service

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: kube-scheduler
      name: kube-scheduler
      namespace: monitoring
    spec:
      endpoints:
      - interval: 30s  # 每30s获取一次信息
        port: http-metrics  # 对应service的端口名
      jobLabel: k8s-app
      namespaceSelector:  # 表示去匹配某一命名空间中的service,如果想从所有的namespace中匹配用any: true
        matchNames:
        - kube-system
      selector:  # 匹配的 Service 的labels,如果使用mathLabels,则下面的所有标签都匹配时才会匹配该service,如果使用matchExpressions,则至少匹配一个标签的service都会被选择
        matchLabels:
          k8s-app: kube-scheduler

    新建prometheus-kubeSchedulerService.yaml

    apiVersion: v1
    kind: Service
    metadata:
      namespace: kube-system
      name: kube-scheduler
      labels:
        k8s-app: kube-scheduler #与servicemonitor中的selector匹配
    spec:
      selector: 
        component: kube-scheduler # 与scheduler的pod标签一直
      ports:
      - name: http-metrics
        port: 10251
        targetPort: 10251
        protocol: TCP

    同理新建prometheus-kubeControllerManagerService.yaml

    apiVersion: v1
    kind: Service
    metadata:
      namespace: kube-system
      name: kube-controller-manager
      labels:
        k8s-app: kube-controller-manager
    spec:
      selector:
        component: kube-controller-manager
      ports:
      - name: http-metrics
        port: 10252
        targetPort: 10252
        protocol: TCP

     


    七、配置Alertmanager

    监控和告警项已经配置好了,那么接下来我们将进行alertmanager告警配置了。

    常用的接收方式就是邮件了,但这里我们将使用企业微信号进行接收,所以开发一个连接微信的应用appalertservice,进行消息转发和处理。

    当然,你也可以直接配置微信号和消息模板,可参考:第3章 Prometheus告警处理

    global:
      resolve_timeout: 5m
      # smtp_smarthost: 'smtp.sina.com:25'
      # smtp_from: '******@sina.com'
      # smtp_auth_username: '******@sina.com'
      # smtp_auth_password: '******'
    route:
      group_by: ['job']
      group_wait: 20s
      group_interval: 30m
      repeat_interval: 12h
      receiver: webhook
    receivers:
      - name: webhook
        webhook_configs:
        - url: 'http://appalertservice:20119/'
    # - name: 'email'
      # email_configs:
      # - to: '******@163.com'

    然后我们需要将alertmanager配置以secret资源类型存储到k8s集群中。

    kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring

     


    八、Grafana数据持久化

    默认Grafana不做数据持久化,那么服务重启以后配置的Dashboard、账号密码等信息将会丢失,所以Grafana做数据持久化也是很有必要的。

    原始的数据是以 emptyDir 形式存放在pod里面,生命周期与pod相同,出现问题时,容器重启,在Grafana里面设置的数据就全部消失了。

      volumeMounts:
      - mountPath: /var/lib/grafana
        name: grafana-storage
        readOnly: false
    ...
    volumes:
    - emptyDir: {}
      name: grafana-storage

    我们把emptyDir修改为pvc方式:

    volumes:
    - name: grafana-storage
      persistentVolumeClaim:
        claimName: grafana

    如果要使用一个 pvc 对象来持久化数据,我们就需要添加一个可用的 pv 供 pvc 绑定使用,grafana-volume.yaml内容如下:

    # mkdir -p /data/k8s/grafana && chmod 777 /data/k8s/grafana && chown nfsnobody /data/k8s/grafana
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: grafana
    spec:
      capacity:
        storage: 10Gi
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Recycle
      nfs:
        server: 10.88.88.108
        path: /data/k8s/grafana
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: grafana
      namespace: monitoring
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi

    其实prometheus也该做数据持久化,不过这里不做更多介绍,可参考:https://blog.z0ukun.com/?p=2605#toc-8


    九、Grafana仪表板配置

    前面我们把监控和告警已经配置好了,那接下来就剩展示了。打开grafana  ->  点击添加按钮 ->Import ->Upload .json file,导入监控仪表板。

    {
      "annotations": {
        "list": [
          {
            "builtIn": 1,
            "datasource": "-- Grafana --",
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "type": "dashboard"
          }
        ]
      },
      "description": "This dashboard provides cluster admins with the ability to monitor nodes and identify workload bottlenecks. It can be deployed with PSPs enabled using the following helm chart - https://github.com/pivotal-cf/charts-grafana",
      "editable": true,
      "gnetId": 10000,
      "graphTooltip": 0,
      "id": 102,
      "iteration": 1597137794957,
      "links": [],
      "panels": [
        {
          "collapsed": false,
          "datasource": null,
          "gridPos": {
            "h": 1,
            "w": 24,
            "x": 0,
            "y": 0
          },
          "id": 34,
          "panels": [],
          "repeat": null,
          "title": "Summary",
          "type": "row"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": true,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "editable": true,
          "error": false,
          "format": "percent",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": true,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 5,
            "w": 8,
            "x": 0,
            "y": 1
          },
          "height": "180px",
          "id": 4,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (container_memory_working_set_bytes{id="/",kubernetes_io_hostname=~"^$Node$"}) / sum (machine_memory_bytes{kubernetes_io_hostname=~"^$Node$"}) * 100",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "65, 90",
          "title": "Cluster memory usage",
          "type": "singlestat",
          "valueFontSize": "80%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": true,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "percent",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": true,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 5,
            "w": 8,
            "x": 8,
            "y": 1
          },
          "height": "180px",
          "id": 6,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (rate (container_cpu_usage_seconds_total{id="/",kubernetes_io_hostname=~"^$Node$"}[$interval])) / sum (machine_cpu_cores{kubernetes_io_hostname=~"^$Node$"}) * 100",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "65, 90",
          "title": "Cluster CPU usage",
          "type": "singlestat",
          "valueFontSize": "80%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": true,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "percent",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": true,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 5,
            "w": 8,
            "x": 16,
            "y": 1
          },
          "height": "180px",
          "id": 7,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (container_fs_usage_bytes{id="/"}) / sum (container_fs_limit_bytes{id="/"}) * 100",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "",
              "metric": "",
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "65, 90",
          "title": "Cluster filesystem usage",
          "type": "singlestat",
          "valueFontSize": "80%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": false,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "bytes",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 3,
            "w": 4,
            "x": 0,
            "y": 6
          },
          "height": "1px",
          "id": 9,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "20%",
          "prefix": "",
          "prefixFontSize": "20%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (container_memory_working_set_bytes{id="/",kubernetes_io_hostname=~"^$Node$"})",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "",
          "title": "Used",
          "type": "singlestat",
          "valueFontSize": "50%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": false,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "bytes",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 3,
            "w": 4,
            "x": 4,
            "y": 6
          },
          "height": "1px",
          "id": 10,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (machine_memory_bytes{kubernetes_io_hostname=~"^$Node$"})",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "",
          "title": "Total",
          "type": "singlestat",
          "valueFontSize": "50%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": false,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "none",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 3,
            "w": 4,
            "x": 8,
            "y": 6
          },
          "height": "1px",
          "id": 11,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": " cores",
          "postfixFontSize": "30%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (rate (container_cpu_usage_seconds_total{id="/",kubernetes_io_hostname=~"^$Node$"}[$interval]))",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "",
          "title": "Used",
          "type": "singlestat",
          "valueFontSize": "50%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": false,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "none",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 3,
            "w": 4,
            "x": 12,
            "y": 6
          },
          "height": "1px",
          "id": 12,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": " cores",
          "postfixFontSize": "30%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (machine_cpu_cores{kubernetes_io_hostname=~"^$Node$"})",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "",
          "title": "Total",
          "type": "singlestat",
          "valueFontSize": "50%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": false,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "bytes",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 3,
            "w": 4,
            "x": 16,
            "y": 6
          },
          "height": "1px",
          "id": 13,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (container_fs_usage_bytes{id="/"})",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "",
          "title": "Used",
          "type": "singlestat",
          "valueFontSize": "50%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": false,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "format": "bytes",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 3,
            "w": 4,
            "x": 20,
            "y": 6
          },
          "height": "1px",
          "id": 14,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum (container_fs_limit_bytes{id="/"})",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": "",
          "title": "Total",
          "type": "singlestat",
          "valueFontSize": "50%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "collapsed": false,
          "datasource": null,
          "gridPos": {
            "h": 1,
            "w": 24,
            "x": 0,
            "y": 9
          },
          "id": 35,
          "panels": [],
          "repeat": null,
          "title": "Memory",
          "type": "row"
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "fill": 0,
          "fillGradient": 0,
          "grid": {},
          "gridPos": {
            "h": 7,
            "w": 24,
            "x": 0,
            "y": 10
          },
          "id": 25,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": false,
            "max": false,
            "min": false,
            "rightSide": true,
            "show": true,
            "sideWidth": 200,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 1,
          "links": [],
          "nullPointMode": "connected",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": true,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum (container_memory_working_set_bytes{image!="",name=~"^k8s_.*",kubernetes_io_hostname=~"^$Node$"}) by (pod)",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "{{ pod }}",
              "metric": "container_memory_usage:sort_desc",
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "Pods memory usage",
          "tooltip": {
            "msResolution": false,
            "shared": true,
            "sort": 2,
            "value_type": "cumulative"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "bytes",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "collapsed": false,
          "datasource": null,
          "gridPos": {
            "h": 1,
            "w": 24,
            "x": 0,
            "y": 17
          },
          "id": 37,
          "panels": [],
          "title": "CPU",
          "type": "row"
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 3,
          "editable": true,
          "error": false,
          "fill": 0,
          "fillGradient": 0,
          "grid": {},
          "gridPos": {
            "h": 7,
            "w": 24,
            "x": 0,
            "y": 18
          },
          "height": "",
          "id": 17,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "max": false,
            "min": false,
            "rightSide": true,
            "show": true,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 1,
          "links": [],
          "nullPointMode": "connected",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": true,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum (rate (container_cpu_usage_seconds_total{image!="",name=~"^k8s_.*",kubernetes_io_hostname=~"^$Node$"}[$interval])) by (pod)",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "{{ pod }}",
              "metric": "container_cpu",
              "refId": "A",
              "step": 10
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "Pods CPU usage",
          "tooltip": {
            "msResolution": true,
            "shared": true,
            "sort": 2,
            "value_type": "cumulative"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "none",
              "label": "cores",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "collapsed": false,
          "datasource": null,
          "gridPos": {
            "h": 1,
            "w": 24,
            "x": 0,
            "y": 25
          },
          "id": 33,
          "panels": [],
          "repeat": null,
          "title": "Network I/O",
          "type": "row"
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "fill": 1,
          "fillGradient": 0,
          "grid": {},
          "gridPos": {
            "h": 7,
            "w": 24,
            "x": 0,
            "y": 26
          },
          "id": 16,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "max": false,
            "min": false,
            "rightSide": true,
            "show": true,
            "sideWidth": 200,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "connected",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum (rate (container_network_receive_bytes_total{image!="",name=~"^k8s_.*",kubernetes_io_hostname=~"^$Node$"}[$interval])) by (pod)",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "-> {{ pod }}",
              "metric": "network",
              "refId": "A",
              "step": 10
            },
            {
              "expr": "- sum (rate (container_network_transmit_bytes_total{image!="",name=~"^k8s_.*",kubernetes_io_hostname=~"^$Node$"}[$interval])) by (pod)",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "<- {{ pod }}",
              "metric": "network",
              "refId": "B",
              "step": 10
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "Pods network I/O",
          "tooltip": {
            "msResolution": false,
            "shared": true,
            "sort": 2,
            "value_type": "cumulative"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "Bps",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "fill": 1,
          "fillGradient": 0,
          "grid": {},
          "gridPos": {
            "h": 5,
            "w": 24,
            "x": 0,
            "y": 33
          },
          "height": "200px",
          "id": 32,
          "legend": {
            "alignAsTable": false,
            "avg": true,
            "current": true,
            "max": false,
            "min": false,
            "rightSide": false,
            "show": false,
            "sideWidth": 200,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "connected",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum (rate (container_network_receive_bytes_total{kubernetes_io_hostname=~"^$Node$"}[$interval]))",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "Received",
              "metric": "network",
              "refId": "A",
              "step": 10
            },
            {
              "expr": "- sum (rate (container_network_transmit_bytes_total{kubernetes_io_hostname=~"^$Node$"}[$interval]))",
              "format": "time_series",
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "Sent",
              "metric": "network",
              "refId": "B",
              "step": 10
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "Network I/O pressure",
          "tooltip": {
            "msResolution": false,
            "shared": true,
            "sort": 0,
            "value_type": "cumulative"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "Bps",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "Bps",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        }
      ],
      "refresh": "10s",
      "schemaVersion": 20,
      "style": "dark",
      "tags": [
        "Prometheus",
        "Kubernetes"
      ],
      "templating": {
        "list": [
          {
            "auto": true,
            "auto_count": 20,
            "auto_min": "2m",
            "current": {
              "text": "auto",
              "value": "$__auto_interval_interval"
            },
            "hide": 2,
            "label": null,
            "name": "interval",
            "options": [
              {
                "selected": true,
                "text": "auto",
                "value": "$__auto_interval_interval"
              },
              {
                "selected": false,
                "text": "1m",
                "value": "1m"
              },
              {
                "selected": false,
                "text": "10m",
                "value": "10m"
              },
              {
                "selected": false,
                "text": "30m",
                "value": "30m"
              },
              {
                "selected": false,
                "text": "1h",
                "value": "1h"
              },
              {
                "selected": false,
                "text": "6h",
                "value": "6h"
              },
              {
                "selected": false,
                "text": "12h",
                "value": "12h"
              },
              {
                "selected": false,
                "text": "1d",
                "value": "1d"
              },
              {
                "selected": false,
                "text": "7d",
                "value": "7d"
              },
              {
                "selected": false,
                "text": "14d",
                "value": "14d"
              },
              {
                "selected": false,
                "text": "30d",
                "value": "30d"
              }
            ],
            "query": "1m,10m,30m,1h,6h,12h,1d,7d,14d,30d",
            "refresh": 2,
            "skipUrlSync": false,
            "type": "interval"
          },
          {
            "current": {
              "text": "prometheus",
              "value": "prometheus"
            },
            "hide": 0,
            "includeAll": false,
            "label": null,
            "multi": false,
            "name": "datasource",
            "options": [],
            "query": "prometheus",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "type": "datasource"
          },
          {
            "allValue": ".*",
            "current": {
              "text": "All",
              "value": "$__all"
            },
            "datasource": "prometheus",
            "definition": "",
            "hide": 0,
            "includeAll": true,
            "label": null,
            "multi": false,
            "name": "Node",
            "options": [],
            "query": "label_values(kubernetes_io_hostname)",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 0,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          }
        ]
      },
      "time": {
        "from": "now-5m",
        "to": "now"
      },
      "timepicker": {
        "refresh_intervals": [
          "5s",
          "10s",
          "30s",
          "1m",
          "5m",
          "15m",
          "30m",
          "1h",
          "2h",
          "1d"
        ],
        "time_options": [
          "5m",
          "15m",
          "1h",
          "6h",
          "12h",
          "24h",
          "2d",
          "7d",
          "30d"
        ]
      },
      "timezone": "browser",
      "title": "育苗通K8S集群监控",
      "uid": "6KoW2MIGk",
      "version": 15
    }
    k8s-model.json

     

    {
      "annotations": {
        "list": [
          {
            "builtIn": 1,
            "datasource": "-- Grafana --",
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "type": "dashboard"
          }
        ]
      },
      "description": "【中文版本】2020.06.28更新,增加整体资源展示!支持 Grafana6&7,Node Exporter v0.16及以上的版本,优化重要指标展示。包含整体资源展示与资源明细图表:CPU 内存 磁盘 IO 网络等监控指标。https://github.com/starsliao/Prometheus",
      "editable": true,
      "gnetId": 8919,
      "graphTooltip": 0,
      "id": 72,
      "iteration": 1597137684806,
      "links": [
        {
          "icon": "external link",
          "tags": [],
          "targetBlank": true,
          "title": "更新node_exporter",
          "tooltip": "",
          "type": "link",
          "url": "https://github.com/prometheus/node_exporter/releases"
        },
        {
          "icon": "external link",
          "tags": [],
          "targetBlank": true,
          "title": "更新当前仪表板",
          "tooltip": "",
          "type": "link",
          "url": "https://grafana.com/dashboards/8919"
        },
        {
          "icon": "external link",
          "tags": [],
          "targetBlank": true,
          "title": "StarsL.cn",
          "tooltip": "",
          "type": "link",
          "url": "https://starsl.cn"
        },
        {
          "asDropdown": true,
          "icon": "external link",
          "tags": [],
          "targetBlank": true,
          "title": "",
          "type": "dashboards"
        }
      ],
      "panels": [
        {
          "collapsed": false,
          "datasource": "prometheus",
          "gridPos": {
            "h": 1,
            "w": 24,
            "x": 0,
            "y": 0
          },
          "id": 187,
          "panels": [],
          "title": "资源总览(关联JOB项)当前选中主机:【$show_hostname】实例:$node",
          "type": "row"
        },
        {
          "columns": [],
          "datasource": "prometheus",
          "description": "分区使用率、磁盘读取、磁盘写入、下载带宽、上传带宽,如果有多个网卡或者多个分区,是采集的使用率最高的网卡或者分区的数值。",
          "fontSize": "100%",
          "gridPos": {
            "h": 12,
            "w": 24,
            "x": 0,
            "y": 1
          },
          "id": 185,
          "options": {},
          "pageSize": 10,
          "showHeader": true,
          "sort": {
            "col": 5,
            "desc": false
          },
          "styles": [
            {
              "alias": "主机名",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 1,
              "link": false,
              "linkTooltip": "",
              "linkUrl": "",
              "mappingType": 1,
              "pattern": "nodename",
              "thresholds": [],
              "type": "string",
              "unit": "bytes"
            },
            {
              "alias": "IP(链接到明细)",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "link": true,
              "linkTargetBlank": false,
              "linkTooltip": "浏览主机明细",
              "linkUrl": "/d/9CWBz0bik/node-exporter?orgId=1&var-job=${job}&var-hostname=All&var-node=${__cell}&var-device=All",
              "mappingType": 1,
              "pattern": "instance",
              "thresholds": [],
              "type": "number",
              "unit": "short"
            },
            {
              "alias": "内存",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "link": false,
              "mappingType": 1,
              "pattern": "Value #B",
              "thresholds": [],
              "type": "number",
              "unit": "bytes"
            },
            {
              "alias": "CPU核",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": null,
              "mappingType": 1,
              "pattern": "Value #C",
              "thresholds": [],
              "type": "number",
              "unit": "short"
            },
            {
              "alias": " 运行时间",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #D",
              "thresholds": [],
              "type": "number",
              "unit": "s"
            },
            {
              "alias": "分区使用率*",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #E",
              "thresholds": [
                "70",
                "85"
              ],
              "type": "number",
              "unit": "percent"
            },
            {
              "alias": "CPU使用率",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #F",
              "thresholds": [
                "70",
                "85"
              ],
              "type": "number",
              "unit": "percent"
            },
            {
              "alias": "内存使用率",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #G",
              "thresholds": [
                "70",
                "85"
              ],
              "type": "number",
              "unit": "percent"
            },
            {
              "alias": "磁盘读取*",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #H",
              "thresholds": [
                "10485760",
                "20485760"
              ],
              "type": "number",
              "unit": "Bps"
            },
            {
              "alias": "磁盘写入*",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #I",
              "thresholds": [
                "10485760",
                "20485760"
              ],
              "type": "number",
              "unit": "Bps"
            },
            {
              "alias": "下载带宽*",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #J",
              "thresholds": [
                "30485760",
                "104857600"
              ],
              "type": "number",
              "unit": "bps"
            },
            {
              "alias": "上传带宽*",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #K",
              "thresholds": [
                "30485760",
                "104857600"
              ],
              "type": "number",
              "unit": "bps"
            },
            {
              "alias": "5m负载",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "Value #L",
              "thresholds": [],
              "type": "number",
              "unit": "short"
            },
            {
              "alias": "",
              "align": "right",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "decimals": 2,
              "pattern": "/.*/",
              "thresholds": [],
              "type": "hidden",
              "unit": "short"
            }
          ],
          "targets": [
            {
              "expr": "node_uname_info{job=~"$job"} - 0",
              "format": "table",
              "instant": true,
              "interval": "",
              "legendFormat": "主机名",
              "refId": "A"
            },
            {
              "expr": "sum(time() - node_boot_time_seconds{job=~"$job"})by(instance)",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "运行时间",
              "refId": "D"
            },
            {
              "expr": "node_memory_MemTotal_bytes{job=~"$job"} - 0",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "总内存",
              "refId": "B"
            },
            {
              "expr": "count(node_cpu_seconds_total{job=~"$job",mode='system'}) by (instance)",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "总核数",
              "refId": "C"
            },
            {
              "expr": "node_load5{job=~"$job"}",
              "format": "table",
              "instant": true,
              "interval": "",
              "legendFormat": "5分钟负载",
              "refId": "L"
            },
            {
              "expr": "(1 - avg(irate(node_cpu_seconds_total{job=~"$job",mode="idle"}[5m])) by (instance)) * 100",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "CPU使用率",
              "refId": "F"
            },
            {
              "expr": "(1 - (node_memory_MemAvailable_bytes{job=~"$job"} / (node_memory_MemTotal_bytes{job=~"$job"})))* 100",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "内存使用率",
              "refId": "G"
            },
            {
              "expr": "max((node_filesystem_size_bytes{job=~"$job",fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{job=~"$job",fstype=~"ext.?|xfs"}) *100/(node_filesystem_avail_bytes {job=~"$job",fstype=~"ext.?|xfs"}+(node_filesystem_size_bytes{job=~"$job",fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{job=~"$job",fstype=~"ext.?|xfs"})))by(instance)",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "分区使用率",
              "refId": "E"
            },
            {
              "expr": "max(irate(node_disk_read_bytes_total{job=~"$job"}[5m])) by (instance)",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "最大读取",
              "refId": "H"
            },
            {
              "expr": "max(irate(node_disk_written_bytes_total{job=~"$job"}[5m])) by (instance)",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "最大写入",
              "refId": "I"
            },
            {
              "expr": "max(irate(node_network_receive_bytes_total{job=~"$job"}[5m])*8) by (instance)",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "下载带宽",
              "refId": "J"
            },
            {
              "expr": "max(irate(node_network_transmit_bytes_total{job=~"$job"}[5m])*8) by (instance)",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "上传带宽",
              "refId": "K"
            }
          ],
          "timeFrom": null,
          "timeShift": null,
          "title": "服务器资源总览表(每页10行)",
          "transform": "table",
          "type": "table"
        },
        {
          "aliasColors": {
            "192.168.200.241:9100_Total": "dark-red",
            "Idle - Waiting for something to happen": "#052B51",
            "guest": "#9AC48A",
            "idle": "#052B51",
            "iowait": "#EAB839",
            "irq": "#BF1B00",
            "nice": "#C15C17",
            "sdb_每秒I/O操作%": "#d683ce",
            "softirq": "#E24D42",
            "steal": "#FCE2DE",
            "system": "#508642",
            "user": "#5195CE",
            "磁盘花费在I/O操作占比": "#ba43a9"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": null,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 0,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 0,
            "y": 13
          },
          "hiddenSeries": false,
          "id": 191,
          "legend": {
            "alignAsTable": false,
            "avg": false,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": false,
            "min": false,
            "rightSide": false,
            "show": true,
            "sideWidth": null,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "maxPerRow": 6,
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "repeat": null,
          "seriesOverrides": [
            {
              "alias": "总平均使用率",
              "lines": false,
              "pointradius": 1,
              "points": true,
              "yaxis": 2
            },
            {
              "alias": "总核数",
              "color": "#C4162A"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "count(node_cpu_seconds_total{job=~"$job", mode='system'})",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总核数",
              "refId": "B",
              "step": 240
            },
            {
              "expr": "sum(node_load5{job=~"$job"})",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总5分钟负载",
              "refId": "A",
              "step": 240
            },
            {
              "expr": "avg(1 - avg(irate(node_cpu_seconds_total{job=~"$job",mode="idle"}[5m])) by (instance)) * 100",
              "format": "time_series",
              "hide": false,
              "interval": "30m",
              "intervalFactor": 1,
              "legendFormat": "总平均使用率",
              "refId": "F",
              "step": 240
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "$job:整体总负载与整体平均CPU使用率",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": null,
              "format": "short",
              "label": "总负载",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "decimals": 0,
              "format": "percent",
              "label": "平均使用率",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "192.168.200.241:9100_总内存": "dark-red",
            "内存_Avaliable": "#6ED0E0",
            "内存_Cached": "#EF843C",
            "内存_Free": "#629E51",
            "内存_Total": "#6d1f62",
            "内存_Used": "#eab839",
            "可用": "#9ac48a",
            "总内存": "#bf1b00"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 1,
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 0,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 8,
            "y": 13
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 195,
          "legend": {
            "alignAsTable": false,
            "avg": false,
            "current": true,
            "max": false,
            "min": false,
            "rightSide": false,
            "show": true,
            "sort": "current",
            "sortDesc": false,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "总内存",
              "color": "#C4162A",
              "fill": 0
            },
            {
              "alias": "总平均使用率",
              "lines": false,
              "pointradius": 1,
              "points": true,
              "yaxis": 2
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(node_memory_MemTotal_bytes{job=~"$job"})",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总内存",
              "refId": "A",
              "step": 4
            },
            {
              "expr": "sum(node_memory_MemTotal_bytes{job=~"$job"} - node_memory_MemAvailable_bytes{job=~"$job"})",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总已用",
              "refId": "B",
              "step": 4
            },
            {
              "expr": "(sum(node_memory_MemTotal_bytes{job=~"$job"} - node_memory_MemAvailable_bytes{job=~"$job"}) / sum(node_memory_MemTotal_bytes{job=~"$job"}))*100",
              "format": "time_series",
              "hide": false,
              "interval": "30m",
              "intervalFactor": 1,
              "legendFormat": "总平均使用率",
              "refId": "H"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "$job:整体总内存与整体平均内存使用率",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": null,
              "format": "bytes",
              "label": "总内存量",
              "logBase": 1,
              "max": null,
              "min": "0",
              "show": true
            },
            {
              "decimals": null,
              "format": "percent",
              "label": "平均使用率",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 1,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 0,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 16,
            "y": 13
          },
          "hiddenSeries": false,
          "id": 197,
          "legend": {
            "alignAsTable": false,
            "avg": false,
            "current": true,
            "hideEmpty": false,
            "hideZero": false,
            "max": false,
            "min": false,
            "rightSide": false,
            "show": true,
            "sideWidth": null,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "总平均使用率",
              "lines": false,
              "pointradius": 1,
              "points": true,
              "yaxis": 2
            },
            {
              "alias": "总磁盘量",
              "color": "#C4162A"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "sum(avg(node_filesystem_size_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance))",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总磁盘量",
              "refId": "E"
            },
            {
              "expr": "sum(avg(node_filesystem_size_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance)) - sum(avg(node_filesystem_free_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance))",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总使用量",
              "refId": "C"
            },
            {
              "expr": "(sum(avg(node_filesystem_size_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance)) - sum(avg(node_filesystem_free_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance))) *100/(sum(avg(node_filesystem_avail_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance))+(sum(avg(node_filesystem_size_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance)) - sum(avg(node_filesystem_free_bytes{job=~"$job",fstype=~"xfs|ext.*"})by(device,instance))))",
              "format": "time_series",
              "instant": false,
              "interval": "30m",
              "intervalFactor": 1,
              "legendFormat": "总平均使用率",
              "refId": "A"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "$job:整体总磁盘与整体平均磁盘使用率",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": 1,
              "format": "bytes",
              "label": "总磁盘量",
              "logBase": 1,
              "max": null,
              "min": "0",
              "show": true
            },
            {
              "decimals": null,
              "format": "percent",
              "label": "平均使用率",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "collapsed": false,
          "datasource": "prometheus",
          "gridPos": {
            "h": 1,
            "w": 24,
            "x": 0,
            "y": 21
          },
          "id": 189,
          "panels": [],
          "title": "资源明细:【$show_hostname】",
          "type": "row"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorPostfix": false,
          "colorPrefix": false,
          "colorValue": true,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "datasource": "prometheus",
          "decimals": 0,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "format": "s",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "threshcisLabels": false,
            "threshcisMarkers": true
          },
          "gridPos": {
            "h": 2,
            "w": 2,
            "x": 0,
            "y": 22
          },
          "hideTimeOverride": true,
          "id": 15,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "null",
          "nullText": null,
          "options": {},
          "pluginVersion": "6.4.2",
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "avg(time() - node_boot_time_seconds{instance=~"$node"})",
              "format": "time_series",
              "hide": false,
              "instant": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 40
            }
          ],
          "threshciss": "1,2",
          "thresholds": "1,3",
          "title": "运行时间",
          "type": "singlestat",
          "valueFontSize": "70%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "datasource": "prometheus",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "thresholds"
              },
              "custom": {},
              "decimals": 2,
              "displayName": "",
              "mappings": [
                {
                  "from": "",
                  "id": 1,
                  "operator": "",
                  "text": "N/A",
                  "to": "",
                  "type": 1,
                  "value": "0"
                }
              ],
              "max": 100,
              "min": 0,
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 70
                  },
                  {
                    "color": "#EAB839",
                    "value": 90
                  }
                ]
              },
              "unit": "percent"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 6,
            "w": 3,
            "x": 2,
            "y": 22
          },
          "id": 177,
          "options": {
            "displayMode": "lcd",
            "fieldOptions": {
              "calcs": [
                "last"
              ],
              "defaults": {
                "decimals": 1,
                "mappings": [
                  {
                    "from": "",
                    "id": 1,
                    "operator": "",
                    "text": "N/A",
                    "to": "",
                    "type": 1,
                    "value": "0"
                  }
                ],
                "max": 100,
                "min": 0.1,
                "thresholds": {
                  "0": {
                    "color": "green",
                    "value": null
                  },
                  "1": {
                    "color": "red",
                    "value": 80
                  },
                  "mode": "absolute",
                  "steps": [
                    {
                      "color": "green",
                      "value": null
                    },
                    {
                      "color": "#EAB839",
                      "value": 70
                    },
                    {
                      "color": "red",
                      "value": 90
                    }
                  ]
                },
                "unit": "percent"
              },
              "override": {},
              "overrides": [],
              "values": false
            },
            "orientation": "horizontal",
            "reduceOptions": {
              "calcs": [
                "mean"
              ],
              "values": false
            },
            "showUnfilled": true
          },
          "pluginVersion": "6.4.3",
          "targets": [
            {
              "expr": "100 - (avg(irate(node_cpu_seconds_total{instance=~"$node",mode="idle"}[5m])) * 100)",
              "instant": true,
              "interval": "",
              "legendFormat": "总CPU使用率",
              "refId": "A"
            },
            {
              "expr": "avg(irate(node_cpu_seconds_total{instance=~"$node",mode="iowait"}[5m])) * 100",
              "hide": true,
              "instant": true,
              "interval": "",
              "legendFormat": "IOwait使用率",
              "refId": "C"
            },
            {
              "expr": "(1 - (node_memory_MemAvailable_bytes{instance=~"$node"} / (node_memory_MemTotal_bytes{instance=~"$node"})))* 100",
              "instant": true,
              "interval": "",
              "legendFormat": "内存使用率",
              "refId": "B"
            },
            {
              "expr": "(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint="$maxmount"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint="$maxmount"})*100 /(node_filesystem_avail_bytes {instance=~'$node',fstype=~"ext.*|xfs",mountpoint="$maxmount"}+(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint="$maxmount"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint="$maxmount"}))",
              "hide": false,
              "instant": true,
              "interval": "",
              "legendFormat": "最大分区({{mountpoint}})使用率",
              "refId": "D"
            },
            {
              "expr": "(1 - ((node_memory_SwapFree_bytes{instance=~"$node"} + 1)/ (node_memory_SwapTotal_bytes{instance=~"$node"} + 1))) * 100",
              "instant": true,
              "legendFormat": "交换分区使用率",
              "refId": "F"
            }
          ],
          "timeFrom": null,
          "timeShift": null,
          "title": "",
          "type": "bargauge"
        },
        {
          "columns": [],
          "datasource": "prometheus",
          "description": "本看板中的:磁盘总量、使用量、可用量、使用率保持和df命令的Size、Used、Avail、Use% 列的值一致,并且Use%的值会四舍五入保留一位小数,会更加准确。
    
    注:df中Use%算法为:(size - free) * 100 / (avail + (size - free)),结果是整除则为该值,非整除则为该值+1,结果的单位是%。
    参考df命令源码:",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fontSize": "100%",
          "gridPos": {
            "h": 6,
            "w": 10,
            "x": 5,
            "y": 22
          },
          "id": 181,
          "links": [
            {
              "targetBlank": true,
              "title": "https://github.com/coreutils/coreutils/blob/master/src/df.c",
              "url": "https://github.com/coreutils/coreutils/blob/master/src/df.c"
            }
          ],
          "options": {},
          "pageSize": null,
          "scroll": true,
          "showHeader": true,
          "sort": {
            "col": 6,
            "desc": false
          },
          "styles": [
            {
              "alias": "分区",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "mappingType": 1,
              "pattern": "mountpoint",
              "thresholds": [
                ""
              ],
              "type": "string",
              "unit": "bytes"
            },
            {
              "alias": "可用空间",
              "align": "auto",
              "colorMode": "value",
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 1,
              "mappingType": 1,
              "pattern": "Value #A",
              "thresholds": [
                "10000000000",
                "20000000000"
              ],
              "type": "number",
              "unit": "bytes"
            },
            {
              "alias": "使用率",
              "align": "auto",
              "colorMode": "cell",
              "colors": [
                "rgba(50, 172, 45, 0.97)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(245, 54, 54, 0.9)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 1,
              "mappingType": 1,
              "pattern": "Value #B",
              "thresholds": [
                "70",
                "85"
              ],
              "type": "number",
              "unit": "percent"
            },
            {
              "alias": "总空间",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 0,
              "link": false,
              "mappingType": 1,
              "pattern": "Value #C",
              "thresholds": [],
              "type": "number",
              "unit": "bytes"
            },
            {
              "alias": "文件系统",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "link": false,
              "mappingType": 1,
              "pattern": "fstype",
              "thresholds": [],
              "type": "string",
              "unit": "short"
            },
            {
              "alias": "设备名",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "dateFormat": "YYYY-MM-DD HH:mm:ss",
              "decimals": 2,
              "link": false,
              "mappingType": 1,
              "pattern": "device",
              "preserveFormat": false,
              "sanitize": false,
              "thresholds": [],
              "type": "string",
              "unit": "short"
            },
            {
              "alias": "",
              "align": "auto",
              "colorMode": null,
              "colors": [
                "rgba(245, 54, 54, 0.9)",
                "rgba(237, 129, 40, 0.89)",
                "rgba(50, 172, 45, 0.97)"
              ],
              "decimals": 2,
              "pattern": "/.*/",
              "preserveFormat": true,
              "sanitize": false,
              "thresholds": [],
              "type": "hidden",
              "unit": "short"
            }
          ],
          "targets": [
            {
              "expr": "node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-0",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总量",
              "refId": "C"
            },
            {
              "expr": "node_filesystem_avail_bytes {instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-0",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "10s",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A"
            },
            {
              "expr": "(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}) *100/(node_filesystem_avail_bytes {instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}+(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}))",
              "format": "table",
              "hide": false,
              "instant": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "B"
            }
          ],
          "title": "【$show_hostname】:各分区可用空间(EXT.*/XFS)",
          "transform": "table",
          "type": "table"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": true,
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "#d44a3a"
          ],
          "datasource": "prometheus",
          "decimals": 2,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "format": "percent",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 2,
            "w": 2,
            "x": 15,
            "y": 22
          },
          "id": 20,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "nullPointMode": "connected",
          "nullText": null,
          "options": {},
          "pluginVersion": "6.4.2",
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": true,
            "lineColor": "#3274D9",
            "show": true,
            "ymax": null,
            "ymin": null
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "avg(irate(node_cpu_seconds_total{instance=~"$node",mode="iowait"}[5m])) * 100",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 20
            }
          ],
          "thresholds": "20,50",
          "timeFrom": null,
          "timeShift": null,
          "title": "CPU iowait",
          "type": "singlestat",
          "valueFontSize": "80%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "avg"
        },
        {
          "aliasColors": {
            "cn-shenzhen.i-wz9cq1dcb6zwc39ehw59_cni0_in": "light-red",
            "cn-shenzhen.i-wz9cq1dcb6zwc39ehw59_cni0_in下载": "green",
            "cn-shenzhen.i-wz9cq1dcb6zwc39ehw59_cni0_out上传": "yellow",
            "cn-shenzhen.i-wz9cq1dcb6zwc39ehw59_eth0_in下载": "purple",
            "cn-shenzhen.i-wz9cq1dcb6zwc39ehw59_eth0_out": "purple",
            "cn-shenzhen.i-wz9cq1dcb6zwc39ehw59_eth0_out上传": "blue"
          },
          "bars": true,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "editable": true,
          "error": false,
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "grid": {},
          "gridPos": {
            "h": 6,
            "w": 7,
            "x": 17,
            "y": 22
          },
          "hiddenSeries": false,
          "id": 183,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": false,
            "show": false,
            "sort": "current",
            "sortDesc": true,
            "total": true,
            "values": true
          },
          "lines": false,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null as zero",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 1,
          "points": false,
          "renderer": "flot",
          "repeat": null,
          "seriesOverrides": [
            {
              "alias": "/.*_out上传$/",
              "transform": "negative-Y"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "increase(node_network_receive_bytes_total{instance=~"$node",device=~"$device"}[60m])",
              "interval": "60m",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_in下载",
              "metric": "",
              "refId": "A",
              "step": 600,
              "target": ""
            },
            {
              "expr": "increase(node_network_transmit_bytes_total{instance=~"$node",device=~"$device"}[60m])",
              "hide": false,
              "interval": "60m",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_out上传",
              "refId": "B",
              "step": 600
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "每小时流量$device",
          "tooltip": {
            "msResolution": false,
            "shared": true,
            "sort": 0,
            "value_type": "cumulative"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "bytes",
              "label": "上传(-)/下载(+)",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorPostfix": false,
          "colorValue": true,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "datasource": "prometheus",
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "format": "short",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 2,
            "w": 2,
            "x": 0,
            "y": 24
          },
          "id": 14,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "maxPerRow": 6,
          "nullPointMode": "null",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "count(node_cpu_seconds_total{instance=~"$node", mode='system'})",
              "format": "time_series",
              "instant": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 20
            }
          ],
          "thresholds": "1,2",
          "title": "CPU 核数",
          "type": "singlestat",
          "valueFontSize": "80%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorPostfix": false,
          "colorValue": true,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "datasource": "prometheus",
          "decimals": null,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "format": "short",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 2,
            "w": 2,
            "x": 15,
            "y": 24
          },
          "id": 179,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "maxPerRow": 6,
          "nullPointMode": "null",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "avg(node_filesystem_files_free{instance=~"$node",mountpoint="$maxmount",fstype=~"ext.?|xfs"})",
              "format": "time_series",
              "instant": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 20
            }
          ],
          "thresholds": "100000,1000000",
          "title": "剩余节点数:$maxmount ",
          "type": "singlestat",
          "valueFontSize": "70%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorValue": true,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "datasource": "prometheus",
          "decimals": 0,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "format": "bytes",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 2,
            "w": 2,
            "x": 0,
            "y": 26
          },
          "id": 75,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "maxPerRow": 6,
          "nullPointMode": "null",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "70%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "sum(node_memory_MemTotal_bytes{instance=~"$node"})",
              "format": "time_series",
              "instant": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{instance}}",
              "refId": "A",
              "step": 20
            }
          ],
          "thresholds": "2,3",
          "title": "总内存",
          "type": "singlestat",
          "valueFontSize": "80%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "cacheTimeout": null,
          "colorBackground": false,
          "colorPostfix": false,
          "colorValue": true,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "datasource": "prometheus",
          "decimals": null,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "format": "locale",
          "gauge": {
            "maxValue": 100,
            "minValue": 0,
            "show": false,
            "thresholdLabels": false,
            "thresholdMarkers": true
          },
          "gridPos": {
            "h": 2,
            "w": 2,
            "x": 15,
            "y": 26
          },
          "id": 178,
          "interval": null,
          "links": [],
          "mappingType": 1,
          "mappingTypes": [
            {
              "name": "value to text",
              "value": 1
            },
            {
              "name": "range to text",
              "value": 2
            }
          ],
          "maxDataPoints": 100,
          "maxPerRow": 6,
          "nullPointMode": "null",
          "nullText": null,
          "options": {},
          "postfix": "",
          "postfixFontSize": "50%",
          "prefix": "",
          "prefixFontSize": "50%",
          "rangeMaps": [
            {
              "from": "null",
              "text": "N/A",
              "to": "null"
            }
          ],
          "sparkline": {
            "fillColor": "rgba(31, 118, 189, 0.18)",
            "full": false,
            "lineColor": "rgb(31, 120, 193)",
            "show": false
          },
          "tableColumn": "",
          "targets": [
            {
              "expr": "avg(node_filefd_maximum{instance=~"$node"})",
              "format": "time_series",
              "instant": true,
              "intervalFactor": 1,
              "legendFormat": "",
              "refId": "A",
              "step": 20
            }
          ],
          "thresholds": "1024,10000",
          "title": "总文件描述符",
          "type": "singlestat",
          "valueFontSize": "70%",
          "valueMaps": [
            {
              "op": "=",
              "text": "N/A",
              "value": "null"
            }
          ],
          "valueName": "current"
        },
        {
          "aliasColors": {
            "192.168.200.241:9100_Total": "dark-red",
            "Idle - Waiting for something to happen": "#052B51",
            "guest": "#9AC48A",
            "idle": "#052B51",
            "iowait": "#EAB839",
            "irq": "#BF1B00",
            "nice": "#C15C17",
            "sdb_每秒I/O操作%": "#d683ce",
            "softirq": "#E24D42",
            "steal": "#FCE2DE",
            "system": "#508642",
            "user": "#5195CE",
            "磁盘花费在I/O操作占比": "#ba43a9"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 0,
            "y": 28
          },
          "hiddenSeries": false,
          "id": 7,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "rightSide": false,
            "show": true,
            "sideWidth": null,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "maxPerRow": 6,
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "repeat": null,
          "seriesOverrides": [
            {
              "alias": "/.*总使用率/",
              "color": "#C4162A",
              "fill": 0
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "avg(irate(node_cpu_seconds_total{instance=~"$node",mode="system"}[5m])) by (instance) *100",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "系统使用率",
              "refId": "A",
              "step": 20
            },
            {
              "expr": "avg(irate(node_cpu_seconds_total{instance=~"$node",mode="user"}[5m])) by (instance) *100",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "用户使用率",
              "refId": "B",
              "step": 240
            },
            {
              "expr": "avg(irate(node_cpu_seconds_total{instance=~"$node",mode="iowait"}[5m])) by (instance) *100",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "磁盘IO使用率",
              "refId": "D",
              "step": 240
            },
            {
              "expr": "(1 - avg(irate(node_cpu_seconds_total{instance=~"$node",mode="idle"}[5m])) by (instance))*100",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总使用率",
              "refId": "F",
              "step": 240
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "CPU使用率",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": 0,
              "format": "percent",
              "label": "",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "192.168.200.241:9100_总内存": "dark-red",
            "使用率": "yellow",
            "内存_Avaliable": "#6ED0E0",
            "内存_Cached": "#EF843C",
            "内存_Free": "#629E51",
            "内存_Total": "#6d1f62",
            "内存_Used": "#eab839",
            "可用": "#9ac48a",
            "总内存": "#bf1b00"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 8,
            "y": 28
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 156,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "rightSide": false,
            "show": true,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "总内存",
              "color": "#C4162A",
              "fill": 0
            },
            {
              "alias": "使用率",
              "color": "rgb(0, 209, 255)",
              "lines": false,
              "pointradius": 1,
              "points": true,
              "yaxis": 2
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "node_memory_MemTotal_bytes{instance=~"$node"}",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "总内存",
              "refId": "A",
              "step": 4
            },
            {
              "expr": "node_memory_MemTotal_bytes{instance=~"$node"} - node_memory_MemAvailable_bytes{instance=~"$node"}",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "已用",
              "refId": "B",
              "step": 4
            },
            {
              "expr": "node_memory_MemAvailable_bytes{instance=~"$node"}",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "可用",
              "refId": "F",
              "step": 4
            },
            {
              "expr": "node_memory_Buffers_bytes{instance=~"$node"}",
              "format": "time_series",
              "hide": true,
              "intervalFactor": 1,
              "legendFormat": "内存_Buffers",
              "refId": "D",
              "step": 4
            },
            {
              "expr": "node_memory_MemFree_bytes{instance=~"$node"}",
              "format": "time_series",
              "hide": true,
              "intervalFactor": 1,
              "legendFormat": "内存_Free",
              "refId": "C",
              "step": 4
            },
            {
              "expr": "node_memory_Cached_bytes{instance=~"$node"}",
              "format": "time_series",
              "hide": true,
              "intervalFactor": 1,
              "legendFormat": "内存_Cached",
              "refId": "E",
              "step": 4
            },
            {
              "expr": "node_memory_MemTotal_bytes{instance=~"$node"} - (node_memory_Cached_bytes{instance=~"$node"} + node_memory_Buffers_bytes{instance=~"$node"} + node_memory_MemFree_bytes{instance=~"$node"})",
              "format": "time_series",
              "hide": true,
              "intervalFactor": 1,
              "refId": "G"
            },
            {
              "expr": "(1 - (node_memory_MemAvailable_bytes{instance=~"$node"} / (node_memory_MemTotal_bytes{instance=~"$node"})))* 100",
              "format": "time_series",
              "hide": false,
              "interval": "30m",
              "intervalFactor": 10,
              "legendFormat": "使用率",
              "refId": "H"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "内存信息",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "bytes",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": "0",
              "show": true
            },
            {
              "format": "percent",
              "label": "内存使用率",
              "logBase": 1,
              "max": "100",
              "min": "0",
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "192.168.10.227:9100_em1_in下载": "super-light-green",
            "192.168.10.227:9100_em1_out上传": "dark-blue"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 16,
            "y": 28
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 157,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "rightSide": false,
            "show": true,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 1,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 2,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "/.*_out上传$/",
              "transform": "negative-Y"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "irate(node_network_receive_bytes_total{instance=~'$node',device=~"$device"}[5m])*8",
              "format": "time_series",
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_in下载",
              "refId": "A",
              "step": 4
            },
            {
              "expr": "irate(node_network_transmit_bytes_total{instance=~'$node',device=~"$device"}[5m])*8",
              "format": "time_series",
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_out上传",
              "refId": "B",
              "step": 4
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "每秒网络带宽使用$device",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "bps",
              "label": "上传(-)/下载(+)",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "15分钟": "#6ED0E0",
            "1分钟": "#BF1B00",
            "5分钟": "#CCA300"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "editable": true,
          "error": false,
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 1,
          "grid": {},
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 0,
            "y": 36
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 13,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "rightSide": false,
            "show": true,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "maxPerRow": 6,
          "nullPointMode": "null as zero",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "repeat": null,
          "seriesOverrides": [
            {
              "alias": "/.*总核数/",
              "color": "#C4162A"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "node_load1{instance=~"$node"}",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "1分钟负载",
              "metric": "",
              "refId": "A",
              "step": 20,
              "target": ""
            },
            {
              "expr": "node_load5{instance=~"$node"}",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "5分钟负载",
              "refId": "B",
              "step": 20
            },
            {
              "expr": "node_load15{instance=~"$node"}",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "15分钟负载",
              "refId": "C",
              "step": 20
            },
            {
              "expr": " sum(count(node_cpu_seconds_total{instance=~"$node", mode='system'}) by (cpu,instance)) by(instance)",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "CPU总核数",
              "refId": "D",
              "step": 20
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "系统平均负载",
          "tooltip": {
            "msResolution": false,
            "shared": true,
            "sort": 2,
            "value_type": "cumulative"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "short",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "vda_write": "#6ED0E0"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "description": "Read bytes 每个磁盘分区每秒读取的比特数
    Written bytes 每个磁盘分区每秒写入的比特数",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 1,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 8,
            "y": 36
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 168,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "show": true,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "/.*_读取$/",
              "transform": "negative-Y"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "irate(node_disk_read_bytes_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_读取",
              "refId": "A",
              "step": 10
            },
            {
              "expr": "irate(node_disk_written_bytes_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_写入",
              "refId": "B",
              "step": 10
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "每秒磁盘读写容量",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": null,
              "format": "Bps",
              "label": "读取(-)/写入(+)",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {},
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 1,
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 0,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 16,
            "y": 36
          },
          "hiddenSeries": false,
          "id": 174,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "rightSide": false,
            "show": true,
            "sideWidth": null,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "/Inodes.*/",
              "yaxis": 2
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}) *100/(node_filesystem_avail_bytes {instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}+(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}))",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{mountpoint}}",
              "refId": "A"
            },
            {
              "expr": "node_filesystem_files_free{instance=~'$node',fstype=~"ext.?|xfs"} / node_filesystem_files{instance=~'$node',fstype=~"ext.?|xfs"}",
              "hide": true,
              "interval": "",
              "legendFormat": "Inodes:{{instance}}:{{mountpoint}}",
              "refId": "B"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "磁盘使用率",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": null,
              "format": "percent",
              "label": "",
              "logBase": 1,
              "max": "100",
              "min": "0",
              "show": true
            },
            {
              "decimals": 2,
              "format": "percentunit",
              "label": null,
              "logBase": 1,
              "max": "1",
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "vda_write": "#6ED0E0"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "description": "Reads completed: 每个磁盘分区每秒读完成次数
    
    Writes completed: 每个磁盘分区每秒写完成次数
    
    IO now 每个磁盘分区每秒正在处理的输入/输出请求数",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 0,
          "fillGradient": 0,
          "gridPos": {
            "h": 9,
            "w": 8,
            "x": 0,
            "y": 44
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 161,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "show": true,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 1,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "/.*_读取$/",
              "transform": "negative-Y"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "irate(node_disk_reads_completed_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_读取",
              "refId": "A",
              "step": 10
            },
            {
              "expr": "irate(node_disk_writes_completed_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "hide": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_写入",
              "refId": "B",
              "step": 10
            },
            {
              "expr": "node_disk_io_now{instance=~"$node"}",
              "format": "time_series",
              "hide": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}",
              "refId": "C"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "磁盘读写速率(IOPS)",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": null,
              "format": "iops",
              "label": "读取(-)/写入(+)I/O ops/sec",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "Idle - Waiting for something to happen": "#052B51",
            "guest": "#9AC48A",
            "idle": "#052B51",
            "iowait": "#EAB839",
            "irq": "#BF1B00",
            "nice": "#C15C17",
            "sdb_每秒I/O操作%": "#d683ce",
            "softirq": "#E24D42",
            "steal": "#FCE2DE",
            "system": "#508642",
            "user": "#5195CE",
            "磁盘花费在I/O操作占比": "#ba43a9"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": null,
          "description": "每一秒钟的自然时间内,花费在I/O上的耗时。(wall-clock time)
    
    node_disk_io_time_seconds_total:
    磁盘花费在输入/输出操作上的秒数。该值为累加值。(Milliseconds Spent Doing I/Os)
    
    irate(node_disk_io_time_seconds_total[1m]):
    计算每秒的速率:(last值-last前一个值)/时间戳差值,即:1秒钟内磁盘花费在I/O操作的时间占比。",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 0,
          "gridPos": {
            "h": 9,
            "w": 8,
            "x": 8,
            "y": 44
          },
          "hiddenSeries": false,
          "id": 175,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": false,
            "rightSide": false,
            "show": true,
            "sideWidth": null,
            "sort": null,
            "sortDesc": null,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 1,
          "links": [],
          "maxPerRow": 6,
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "irate(node_disk_io_time_seconds_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_每秒I/O操作%",
              "refId": "C"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "每1秒内I/O操作耗时占比",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "decimals": null,
              "format": "percentunit",
              "label": "",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "vda": "#6ED0E0"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "description": "Read time seconds 每个磁盘分区读操作花费的秒数
    
    Write time seconds 每个磁盘分区写操作花费的秒数
    
    IO time seconds 每个磁盘分区输入/输出操作花费的秒数
    
    IO time weighted seconds每个磁盘分区输入/输出操作花费的加权秒数",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 1,
          "fillGradient": 1,
          "gridPos": {
            "h": 9,
            "w": 8,
            "x": 16,
            "y": 44
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 160,
          "legend": {
            "alignAsTable": true,
            "avg": true,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": true,
            "show": true,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null as zero",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "/,*_读取$/",
              "transform": "negative-Y"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "irate(node_disk_read_time_seconds_total{instance=~"$node"}[5m]) / irate(node_disk_reads_completed_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_读取",
              "refId": "B"
            },
            {
              "expr": "irate(node_disk_write_time_seconds_total{instance=~"$node"}[5m]) / irate(node_disk_writes_completed_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_写入",
              "refId": "C"
            },
            {
              "expr": "irate(node_disk_io_time_seconds_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "hide": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}",
              "refId": "A",
              "step": 10
            },
            {
              "expr": "irate(node_disk_io_time_weighted_seconds_total{instance=~"$node"}[5m])",
              "format": "time_series",
              "hide": true,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "{{device}}_加权",
              "refId": "D"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "每次IO读写的耗时(参考:小于100ms)(beta)",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "s",
              "label": "读取(-)/写入(+)",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": false
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "192.168.200.241:9100_TCP_alloc": "semi-dark-blue",
            "TCP": "#6ED0E0",
            "TCP_alloc": "blue"
          },
          "bars": false,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "decimals": 2,
          "description": "Sockets_used - 已使用的所有协议套接字总量
    
    CurrEstab - 当前状态为 ESTABLISHED 或 CLOSE-WAIT 的 TCP 连接数
    
    TCP_alloc - 已分配(已建立、已申请到sk_buff)的TCP套接字数量
    
    TCP_tw - 等待关闭的TCP连接数
    
    UDP_inuse - 正在使用的 UDP 套接字数量
    
    RetransSegs - TCP 重传报文数
    
    OutSegs - TCP 发送的报文数
    
    InSegs - TCP 接收的报文数",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 0,
          "fillGradient": 0,
          "gridPos": {
            "h": 8,
            "w": 16,
            "x": 0,
            "y": 53
          },
          "height": "300",
          "hiddenSeries": false,
          "id": 158,
          "interval": "",
          "legend": {
            "alignAsTable": true,
            "avg": false,
            "current": true,
            "hideEmpty": true,
            "hideZero": true,
            "max": true,
            "min": false,
            "rightSide": true,
            "show": true,
            "sideWidth": null,
            "sort": "current",
            "sortDesc": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 1,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pointradius": 5,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "/.*Sockets_used/",
              "color": "#E02F44",
              "lines": false,
              "pointradius": 1,
              "points": true,
              "yaxis": 2
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "node_netstat_Tcp_CurrEstab{instance=~'$node'}",
              "format": "time_series",
              "hide": false,
              "instant": false,
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "CurrEstab",
              "refId": "A",
              "step": 20
            },
            {
              "expr": "node_sockstat_TCP_tw{instance=~'$node'}",
              "format": "time_series",
              "interval": "",
              "intervalFactor": 1,
              "legendFormat": "TCP_tw",
              "refId": "D"
            },
            {
              "expr": "node_sockstat_sockets_used{instance=~'$node'}",
              "hide": false,
              "interval": "30m",
              "intervalFactor": 1,
              "legendFormat": "Sockets_used",
              "refId": "B"
            },
            {
              "expr": "node_sockstat_UDP_inuse{instance=~'$node'}",
              "interval": "",
              "legendFormat": "UDP_inuse",
              "refId": "C"
            },
            {
              "expr": "node_sockstat_TCP_alloc{instance=~'$node'}",
              "interval": "",
              "legendFormat": "TCP_alloc",
              "refId": "E"
            },
            {
              "expr": "irate(node_netstat_Tcp_PassiveOpens{instance=~'$node'}[5m])",
              "hide": true,
              "interval": "",
              "legendFormat": "{{instance}}_Tcp_PassiveOpens",
              "refId": "G"
            },
            {
              "expr": "irate(node_netstat_Tcp_ActiveOpens{instance=~'$node'}[5m])",
              "hide": true,
              "interval": "",
              "legendFormat": "{{instance}}_Tcp_ActiveOpens",
              "refId": "F"
            },
            {
              "expr": "irate(node_netstat_Tcp_InSegs{instance=~'$node'}[5m])",
              "interval": "",
              "legendFormat": "Tcp_InSegs",
              "refId": "H"
            },
            {
              "expr": "irate(node_netstat_Tcp_OutSegs{instance=~'$node'}[5m])",
              "interval": "",
              "legendFormat": "Tcp_OutSegs",
              "refId": "I"
            },
            {
              "expr": "irate(node_netstat_Tcp_RetransSegs{instance=~'$node'}[5m])",
              "hide": false,
              "interval": "",
              "legendFormat": "Tcp_RetransSegs",
              "refId": "J"
            },
            {
              "expr": "irate(node_netstat_TcpExt_ListenDrops{instance=~'$node'}[5m])",
              "hide": true,
              "interval": "",
              "legendFormat": "",
              "refId": "K"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "网络Socket连接信息",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "transformations": [],
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "short",
              "label": null,
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": "已使用的所有协议套接字总量",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        },
        {
          "aliasColors": {
            "filefd_192.168.200.241:9100": "super-light-green",
            "switches_192.168.200.241:9100": "semi-dark-red",
            "使用的文件描述符_10.118.72.128:9100": "red",
            "每秒上下文切换次数_10.118.71.245:9100": "yellow",
            "每秒上下文切换次数_10.118.72.128:9100": "yellow"
          },
          "bars": false,
          "cacheTimeout": null,
          "dashLength": 10,
          "dashes": false,
          "datasource": "prometheus",
          "description": "",
          "fieldConfig": {
            "defaults": {
              "custom": {}
            },
            "overrides": []
          },
          "fill": 0,
          "fillGradient": 1,
          "gridPos": {
            "h": 8,
            "w": 8,
            "x": 16,
            "y": 53
          },
          "hiddenSeries": false,
          "hideTimeOverride": false,
          "id": 16,
          "legend": {
            "alignAsTable": false,
            "avg": false,
            "current": true,
            "max": false,
            "min": false,
            "rightSide": false,
            "show": true,
            "total": false,
            "values": true
          },
          "lines": true,
          "linewidth": 2,
          "links": [],
          "nullPointMode": "null",
          "options": {
            "dataLinks": []
          },
          "percentage": false,
          "pluginVersion": "6.4.2",
          "pointradius": 1,
          "points": false,
          "renderer": "flot",
          "seriesOverrides": [
            {
              "alias": "/每秒上下文切换次数.*/",
              "color": "#FADE2A",
              "lines": false,
              "pointradius": 1,
              "points": true,
              "yaxis": 2
            },
            {
              "alias": "/使用的文件描述符.*/",
              "color": "#F2495C"
            }
          ],
          "spaceLength": 10,
          "stack": false,
          "steppedLine": false,
          "targets": [
            {
              "expr": "node_filefd_allocated{instance=~"$node"}",
              "format": "time_series",
              "instant": false,
              "interval": "",
              "intervalFactor": 5,
              "legendFormat": "使用的文件描述符",
              "refId": "B"
            },
            {
              "expr": "irate(node_context_switches_total{instance=~"$node"}[5m])",
              "interval": "",
              "intervalFactor": 5,
              "legendFormat": "每秒上下文切换次数",
              "refId": "A"
            },
            {
              "expr": "  (node_filefd_allocated{instance=~"$node"}/node_filefd_maximum{instance=~"$node"}) *100",
              "format": "time_series",
              "hide": true,
              "instant": false,
              "interval": "",
              "intervalFactor": 5,
              "legendFormat": "使用的文件描述符占比_{{instance}}",
              "refId": "C"
            }
          ],
          "thresholds": [],
          "timeFrom": null,
          "timeRegions": [],
          "timeShift": null,
          "title": "打开的文件描述符(左 )/每秒上下文切换次数(右)",
          "tooltip": {
            "shared": true,
            "sort": 2,
            "value_type": "individual"
          },
          "type": "graph",
          "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
          },
          "yaxes": [
            {
              "format": "short",
              "label": "使用的文件描述符",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            },
            {
              "format": "short",
              "label": "context_switches",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true
            }
          ],
          "yaxis": {
            "align": false,
            "alignLevel": null
          }
        }
      ],
      "refresh": "",
      "schemaVersion": 20,
      "style": "dark",
      "tags": [
        "Prometheus",
        "node_exporter"
      ],
      "templating": {
        "list": [
          {
            "allValue": null,
            "current": {
              "tags": [],
              "text": "node-exporter",
              "value": "node-exporter"
            },
            "datasource": "prometheus",
            "definition": "label_values(node_uname_info, job)",
            "hide": 0,
            "includeAll": false,
            "index": -1,
            "label": "JOB",
            "multi": false,
            "name": "job",
            "options": [],
            "query": "label_values(node_uname_info, job)",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 5,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          },
          {
            "allValue": null,
            "current": {
              "text": "All",
              "value": "$__all"
            },
            "datasource": "prometheus",
            "definition": "label_values(node_uname_info{job=~"$job"}, nodename)",
            "hide": 0,
            "includeAll": true,
            "index": -1,
            "label": "主机名",
            "multi": false,
            "name": "hostname",
            "options": [],
            "query": "label_values(node_uname_info{job=~"$job"}, nodename)",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 5,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          },
          {
            "allFormat": "glob",
            "allValue": null,
            "current": {
              "text": "ymt108",
              "value": "ymt108"
            },
            "datasource": "prometheus",
            "definition": "label_values(node_uname_info{job=~"$job",nodename=~"$hostname"},instance)",
            "hide": 0,
            "includeAll": false,
            "index": -1,
            "label": "Instance",
            "multi": true,
            "multiFormat": "regex values",
            "name": "node",
            "options": [],
            "query": "label_values(node_uname_info{job=~"$job",nodename=~"$hostname"},instance)",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 5,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          },
          {
            "allFormat": "glob",
            "allValue": null,
            "current": {
              "text": "All",
              "value": "$__all"
            },
            "datasource": "prometheus",
            "definition": "label_values(node_network_info{device!~'tap.*|veth.*|br.*|docker.*|virbr.*|lo.*|cni.*'},device)",
            "hide": 0,
            "includeAll": true,
            "index": -1,
            "label": "网卡",
            "multi": true,
            "multiFormat": "regex values",
            "name": "device",
            "options": [],
            "query": "label_values(node_network_info{device!~'tap.*|veth.*|br.*|docker.*|virbr.*|lo.*|cni.*'},device)",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 1,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          },
          {
            "allValue": null,
            "current": {
              "text": "/",
              "value": "/"
            },
            "datasource": "prometheus",
            "definition": "query_result(topk(1,sort_desc (max(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.?|xfs",mountpoint!~".*pods.*"}) by (mountpoint))))",
            "hide": 2,
            "includeAll": false,
            "index": -1,
            "label": "最大挂载目录",
            "multi": false,
            "name": "maxmount",
            "options": [],
            "query": "query_result(topk(1,sort_desc (max(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.?|xfs",mountpoint!~".*pods.*"}) by (mountpoint))))",
            "refresh": 2,
            "regex": "/.*\"(.*)\".*/",
            "skipUrlSync": false,
            "sort": 5,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          },
          {
            "allValue": null,
            "current": {
              "text": "ymt108",
              "value": "ymt108"
            },
            "datasource": "prometheus",
            "definition": "label_values(node_uname_info{job=~"$job",instance=~"$node"}, nodename)",
            "hide": 2,
            "includeAll": false,
            "index": -1,
            "label": "展示使用的主机名",
            "multi": false,
            "name": "show_hostname",
            "options": [],
            "query": "label_values(node_uname_info{job=~"$job",instance=~"$node"}, nodename)",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 5,
            "tagValuesQuery": "",
            "tags": [],
            "tagsQuery": "",
            "type": "query",
            "useTags": false
          }
        ]
      },
      "time": {
        "from": "now-12h",
        "to": "now"
      },
      "timepicker": {
        "hidden": false,
        "now": true,
        "refresh_intervals": [
          "15s",
          "30s",
          "1m",
          "5m",
          "15m",
          "30m"
        ],
        "time_options": [
          "5m",
          "15m",
          "1h",
          "6h",
          "12h",
          "24h",
          "2d",
          "7d",
          "30d"
        ]
      },
      "timezone": "browser",
      "title": "育苗通Node资源监控",
      "uid": "hb7fSE0Zz",
      "version": 11
    }
    node-model.json

    当然,默认还内置了很多k8s相关的资源监控模板。

     


    十、汇总

    特殊说明1:我们还可以自定义etcd监控,详情可参考:https://www.jianshu.com/p/2fbbe767870d

    • 第一步:建立一个 ServiceMonitor 对象,用于 Prometheus 添加监控项;
    • 第二步:为 ServiceMonitor 对象关联 metrics 数据接口的一个 Service 对象;
    • 第三步:确保 Service 对象可以正确获取到 metrics 数据。

    特殊说明2:部署kube-prometheus可能会出现无法连接apiserver问题,详情可参考:【解决】Error from server (ServiceUnavailable): the server is currently unable to handle the request

    当我们完成了所有配置, 那接下来还需要整理一下,编写升级脚本upgrade.sh,方便之后部署,以及修改更新。

    #!/bin/sh
    
    # deploy kubernetes service
    kubectl apply -f prometheus-kubeControllerManagerService.yaml
    kubectl apply -f prometheus-kubeSchedulerService.yaml
    
    # upgrade alertmanager configuration
    kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring --dry-run -oyaml > alertmanager-secret.yaml
    kubectl apply -f alertmanager-secret.yaml
    
    # upgrade scrape configs
    kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring --dry-run -oyaml > additional-scrape-configs.yaml
    kubectl apply -f additional-scrape-configs.yaml
    
    # upgrade prometheus rules
    kubectl apply -f prometheus-additional-rules.yaml
    kubectl apply -f prometheus-rules.yaml
    
    # upgrade prometheus configuration
    kubectl apply -f prometheus-prometheus.yaml
    
    # upgrade grafana configuration
    kubectl apply -f grafana-volume.yaml
    kubectl apply -f grafana-deployment.yaml

    作者:Leozhanggg

    出处:https://www.cnblogs.com/leozhanggg/p/13502983.html

    本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

  • 相关阅读:
    Vue内敛模板
    vue自定义组件添加原生事件监听
    vue 组件开发 props 验证
    Vue中子组件数据跟着父组件改变和父组件数据跟着子组件改变的方法
    jQuery中outerWidth()方法
    CSS3-transition
    行内元素(例如)设置float之后才能用width调整宽度
    leetcode LRU Cache python
    opcache effect
    leetcode Same Tree python
  • 原文地址:https://www.cnblogs.com/leozhanggg/p/13502983.html
Copyright © 2011-2022 走看看