zoukankan      html  css  js  c++  java
  • Kubernetes Prometheus

    简介


    广告:给自己做的Grafana Dashboard宣传一下,不敢保证比原厂做的好,但是在Grafana官方中绝对是最靓的

    https://grafana.com/hyomin

    Promethues是一款原生云计算基金项目,完全开源用于监控系统、服务、容器、数据库等,收集根据客户端的target配置监控项,依据监控判断表达式,进行数据展示及警报阀值进行告警
    与其它监控系统比较有如下特点
    1.   多维度数据模型(时间序列,KEY/VALUE)
    2.   灵活的查询语言
    3.   不依赖其它分布式存储 ,可自主独立存储
    4.   时间序列收集通过HTTP PULL MODEL
    5.   通过服务发现&配置文件来发现客户端
    6.   支持多种模型的dashboard
    可以采用 push gateway 的方式把时间序列数据推送至 Prometheus server 端

    采集方式

    由于数据采集可能会有丢失,所以 Prometheus 不适用对采集数据要 100% 准确的情形。但如果用于记录时间序列数据,Prometheus 具有很大的查询优势,此外,Prometheus 适用于微服务的体系架构

    pull方式

    Prometheus采集数据是用的pull也就是拉模型,通过HTTP协议去采集指标,只要应用系统能够提供HTTP接口就可以接入监控系统,相比于私有协议或二进制协议来说开发、简单。

    push方式

    对于定时任务这种短周期的指标采集,如果采用pull模式,可能造成任务结束了,Prometheus还没有来得及采集,这个时候可以使用加一个中转层,客户端推数据到Push Gateway缓存一下,由Prometheus从push gateway pull指标过来。(需要额外搭建Push Gateway,同时需要新增job去从gateway采数据)

    组件架构


    Prometheus组件

    • Prometheus Server 数据库,数据处理,数据存储,数据查询,报警规则等数据图表展示功能
    • alertmanager 报警接收及发送管理,提供报警定义模板配置,报警发送,报警路由定义
    • Pushgateway 数据采集中转站,数据缓存管理
    • exporters 数据采集
    • Client library 

    架构图

    部署方式


    部署方式helm

    引用文档,如下

    官方,如何配置服务发现来监控kubernetes https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

     Kubernetes service discoveries暴露以下role以供Prometheus采集(Prometheus从kubernetes的REST API接口来scrape以下目标资源信息,并同时保持状态的同步)

    • node
    • endpoint
    • service
    • pod
    • ingress

    安装方式

    1. 在集群外部安装Prometheus,需要在prometheus.yml配置集群的ca证书
    2. 在集群内部安装Prometheus,需要创建rbac鉴权策略,具体安装可参考helm官方安装,本指南主要使用helm安装

    helm安装共有二个链接,一个官方helm目前已经弃用,其来源指向也是artifacthub,一个是artifacthub专业kubernetes安装集成部署CNCF projects

    1. https://github.com/helm/charts
    2. https://artifacthub.io

    prometheus artifacthub.io 安装地址

    https://artifacthub.io/packages/helm/prometheus-community/prometheus

    链接已经详细的指出如何使用helm安装Prometheus,及预安装的必须条件

    为什么必须依赖于kube-state-metrics

    因为kube-state-metrics主要用来监控kubernetes的副本集的活动状态

    1. 添加repo,如下
      <root@PROD-K8S-CP1 ~># helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
      "prometheus-community" has been added to your repositories
      <root@PROD-K8S-CP1 ~># helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
      "kube-state-metrics" has been added to your repositories
      <root@PROD-K8S-CP1 ~># helm repo update
      Hang tight while we grab the latest from your chart repositories...
      ...Successfully got an update from the "kube-state-metrics" chart repository
      ...Successfully got an update from the "cilium" chart repository
      ...Successfully got an update from the "prometheus-community" chart repository
      Update Complete. ⎈ Happy Helming!⎈ 
    2. 安装Prometheus
      <root@PROD-K8S-CP1 ~>#  helm install prometheus prometheus-community/prometheus --version 14.1.0
      NAME: prometheus
      LAST DEPLOYED: Thu Sep  2 15:57:19 2021
      NAMESPACE: default
      STATUS: deployed
      REVISION: 1
      TEST SUITE: None
      NOTES:
      The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
      prometheus-server.default.svc.cluster.local
      
      
      Get the Prometheus server URL by running these commands in the same shell:
        export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
        kubectl --namespace default port-forward $POD_NAME 9090
      
      
      The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster:
      prometheus-alertmanager.default.svc.cluster.local
      
      
      Get the Alertmanager URL by running these commands in the same shell:
        export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
        kubectl --namespace default port-forward $POD_NAME 9093
      #################################################################################
      ######   WARNING: Pod Security Policy has been moved to a global property.  #####
      ######            use .Values.podSecurityPolicy.enabled with pod-based      #####
      ######            annotations                                               #####
      ######            (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #####
      #################################################################################
      
      
      The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
      prometheus-pushgateway.default.svc.cluster.local
      
      
      Get the PushGateway URL by running these commands in the same shell:
        export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
        kubectl --namespace default port-forward $POD_NAME 9091
      
      For more information on running Prometheus, visit:
      https://prometheus.io/
    3. 查看部署状态
      <root@PROD-K8S-CP1 ~># kubectl get pods --all-namespaces  -o wide| grep node
      default         prometheus-node-exporter-9rm22                   1/1     Running   0          51m     10.1.17.237     prod-be-k8s-wn7     <none>           <none>
      default         prometheus-node-exporter-qfxgz                   1/1     Running   0          51m     10.1.17.238     prod-be-k8s-wn8     <none>           <none>
      default         prometheus-node-exporter-z5znx                   1/1     Running   0          51m     10.1.17.236     prod-be-k8s-wn6     <none>           <none>
      <root@PROD-K8S-CP1 ~># kubectl get pods --all-namespaces  -o wide| grep prometheus
      default         prometheus-alertmanager-755d84cf4f-rfz4n         0/2     Pending   0          51m     <none>          <none>              <none>           <none>
      default         prometheus-kube-state-metrics-86dc6bb59f-wlpcl   1/1     Running   0          51m     172.21.12.167   prod-be-k8s-wn8     <none>           <none>
      default         prometheus-node-exporter-9rm22                   1/1     Running   0          51m     10.1.17.237     prod-be-k8s-wn7     <none>           <none>
      default         prometheus-node-exporter-qfxgz                   1/1     Running   0          51m     10.1.17.238     prod-be-k8s-wn8     <none>           <none>
      default         prometheus-node-exporter-z5znx                   1/1     Running   0          51m     10.1.17.236     prod-be-k8s-wn6     <none>           <none>
      default         prometheus-pushgateway-745d67dd5f-7ckvv          1/1     Running   0          51m     172.21.12.2     prod-be-k8s-wn7     <none>           <none>
      default         prometheus-server-867f854484-lcrq6 

      # 默认部署,启用了持久化存储,由于默认的持久化存储没有指定详细的PVC,所以在安装完需要调整持久化配置段,当然也可以选择不需要持久化存储,可以选择将本地卷映射入Pod中
    4. 自定义安装
      helm install prometheus prometheus-community/prometheus -f prometheus-values.yaml
      <root@PROD-K8S-CP1 ~># helm show values prometheus-community/prometheus --version 14.1.0 > prometheus-values.yaml
      
      修订版
      rbac:
        create: true
      
      podSecurityPolicy:
        enabled: false
      
      imagePullSecrets:
      # - name: "image-pull-secret"
      
      ## Define serviceAccount names for components. Defaults to component's fully qualified name.
      ##
      serviceAccounts:
        alertmanager:
          create: true
          name:
          annotations: {}
        nodeExporter:
          create: true
          name:
          annotations: {}
        pushgateway:
          create: true
          name:
          annotations: {}
        server:
          create: true
          name:
          annotations: {}
      
      alertmanager:
        ## If false, alertmanager will not be installed
        ##
        enabled: true
      
        ## Use a ClusterRole (and ClusterRoleBinding)
        ## - If set to false - we define a Role and RoleBinding in the defined namespaces ONLY
        ## This makes alertmanager work - for users who do not have ClusterAdmin privs, but wants alertmanager to operate on their own namespaces, instead of clusterwide.
        useClusterRole: true
      
        ## Set to a rolename to use existing role - skipping role creating - but still doing serviceaccount and rolebinding to the rolename set here.
        useExistingRole: false
      
        ## alertmanager container name
        ##
        name: alertmanager
      
        ## alertmanager container image
        ##
        image:
          repository: quay.io/prometheus/alertmanager
          tag: v0.21.0
          pullPolicy: IfNotPresent
      
        ## alertmanager priorityClassName
        ##
        priorityClassName: ""
      
        ## Additional alertmanager container arguments
        ##
        extraArgs: {}
      
        ## Additional InitContainers to initialize the pod
        ##
        extraInitContainers: []
      
        ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug
        ## so that the various internal URLs are still able to access as they are in the default case.
        ## (Optional)
        prefixURL: ""
      
        ## External URL which can access alertmanager
        baseURL: "http://localhost:9093"
      
        ## Additional alertmanager container environment variable
        ## For instance to add a http_proxy
        ##
        extraEnv: {}
      
        ## Additional alertmanager Secret mounts
        # Defines additional mounts with secrets. Secrets must be manually created in the namespace.
        extraSecretMounts: []
          # - name: secret-files
          #   mountPath: /etc/secrets
          #   subPath: ""
          #   secretName: alertmanager-secret-files
          #   readOnly: true
      
        ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.alertmanager.configMapOverrideName}}
        ## Defining configMapOverrideName will cause templates/alertmanager-configmap.yaml
        ## to NOT generate a ConfigMap resource
        ##
        configMapOverrideName: ""
      
        ## The name of a secret in the same kubernetes namespace which contains the Alertmanager config
        ## Defining configFromSecret will cause templates/alertmanager-configmap.yaml
        ## to NOT generate a ConfigMap resource
        ##
        configFromSecret: ""
      
        ## The configuration file name to be loaded to alertmanager
        ## Must match the key within configuration loaded from ConfigMap/Secret
        ##
        configFileName: alertmanager.yml
      
        ingress:
          ## If true, alertmanager Ingress will be created
          ##
          enabled: false
      
          ## alertmanager Ingress annotations
          ##
          annotations: {}
          #   kubernetes.io/ingress.class: nginx
          #   kubernetes.io/tls-acme: 'true'
      
          ## alertmanager Ingress additional labels
          ##
          extraLabels: {}
      
          ## alertmanager Ingress hostnames with optional path
          ## Must be provided if Ingress is enabled
          ##
          hosts: []
          #   - alertmanager.domain.com
          #   - domain.com/alertmanager
      
          ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
          extraPaths: []
          # - path: /*
          #   backend:
          #     serviceName: ssl-redirect
          #     servicePort: use-annotation
      
          ## alertmanager Ingress TLS configuration
          ## Secrets must be manually created in the namespace
          ##
          tls: []
          #   - secretName: prometheus-alerts-tls
          #     hosts:
          #       - alertmanager.domain.com
      
        ## Alertmanager Deployment Strategy type
        # strategy:
        #   type: Recreate
      
        ## Node tolerations for alertmanager scheduling to nodes with taints
        ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
        ##
        ##  配置alertmanager污点容忍
        tolerations:
          - key: resource
          #   operator: "Equal|Exists"
            value: base
          #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
            effect: NoExecute
      
        ## Node labels for alertmanager pod assignment
        ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
        ##
        nodeSelector:
          kubernetes.io/hostname: prod-sys-k8s-wn3
      
        ## Pod affinity
        ##
        ## 配置alertmanager节点亲和性
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: kubernetes.io/resource
                      operator: In
                      values:
                        - base
      
        ## PodDisruptionBudget settings
        ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
        ##
        podDisruptionBudget:
          enabled: false
          maxUnavailable: 1
      
        ## Use an alternate scheduler, e.g. "stork".
        ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
        ##
        # schedulerName:
      
        persistentVolume:
          ## If true, alertmanager will create/use a Persistent Volume Claim
          ## If false, use emptyDir
          ##
          enabled: true
      
          ## alertmanager data Persistent Volume access modes
          ## Must match those of existing PV or dynamic provisioner
          ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
          ##
          accessModes:
            - ReadWriteOnce
      
          ## alertmanager data Persistent Volume Claim annotations
          ##
          annotations: {}
      
          ## alertmanager data Persistent Volume existing claim name
          ## Requires alertmanager.persistentVolume.enabled: true
          ## If defined, PVC must be created manually before volume will be bound
          existingClaim: ""
      
          ## alertmanager data Persistent Volume mount root path
          ##
          mountPath: /data
      
          ## alertmanager data Persistent Volume size
          ##
          size: 20Gi
      
          ## alertmanager data Persistent Volume Storage Class
          ## If defined, storageClassName: <storageClass>
          ## If set to "-", storageClassName: "", which disables dynamic provisioning
          ## If undefined (the default) or set to null, no storageClassName spec is
          ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
          ##   GKE, AWS & OpenStack)
          ##
          # storageClass: "-"
          ## 配置alertmanager持久化存储
          storageClass: "alicloud-disk-essd"
      
          ## alertmanager data Persistent Volume Binding Mode
          ## If defined, volumeBindingMode: <volumeBindingMode>
          ## If undefined (the default) or set to null, no volumeBindingMode spec is
          ##   set, choosing the default mode.
          ##
          # volumeBindingMode: ""
      
          ## Subdirectory of alertmanager data Persistent Volume to mount
          ## Useful if the volume's root directory is not empty
          ##
          subPath: ""
      
        emptyDir:
          ## alertmanager emptyDir volume size limit
          ##
          sizeLimit: ""
      
        ## Annotations to be added to alertmanager pods
        ##
        podAnnotations: {}
          ## Tell prometheus to use a specific set of alertmanager pods
          ## instead of all alertmanager pods found in the same namespace
          ## Useful if you deploy multiple releases within the same namespace
          ##
          ## prometheus.io/probe: alertmanager-teamA
      
        ## Labels to be added to Prometheus AlertManager pods
        ##
        podLabels: {}
      
        ## Specify if a Pod Security Policy for node-exporter must be created
        ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
        ##
        podSecurityPolicy:
          annotations: {}
            ## Specify pod annotations
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
            ##
            # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
            # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
            # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
      
        ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below)
        ##
        replicaCount: 1
      
        ## Annotations to be added to deployment
        ##
        deploymentAnnotations: {}
      
        statefulSet:
          ## If true, use a statefulset instead of a deployment for pod management.
          ## This allows to scale replicas to more than 1 pod
          ##
          enabled: false
      
          annotations: {}
          labels: {}
          podManagementPolicy: OrderedReady
      
          ## Alertmanager headless service to use for the statefulset
          ##
          headless:
            annotations: {}
            labels: {}
      
            ## Enabling peer mesh service end points for enabling the HA alert manager
            ## Ref: https://github.com/prometheus/alertmanager/blob/master/README.md
            enableMeshPeer: false
      
            servicePort: 80
      
        ## alertmanager resource requests and limits
        ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
        ##
        ## 配置资源限额
        resources:
          limits:
            cpu: 2
            memory: 2Gi
          requests:
            cpu: 10m
            memory: 32Mi
      
        # Custom DNS configuration to be added to alertmanager pods
        dnsConfig: {}
          # nameservers:
          #   - 1.2.3.4
          # searches:
          #   - ns1.svc.cluster-domain.example
          #   - my.dns.search.suffix
          # options:
          #   - name: ndots
          #     value: "2"
        #   - name: edns0
      
        ## 配置网络模式
        hostNetwork: false
      
        ## Security context to be added to alertmanager pods
        ##
        securityContext:
          runAsUser: 65534
          runAsNonRoot: true
          runAsGroup: 65534
          fsGroup: 65534
      
        service:
          annotations: {}
          labels: {}
          clusterIP: ""
      
          ## Enabling peer mesh service end points for enabling the HA alert manager
          ## Ref: https://github.com/prometheus/alertmanager/blob/master/README.md
          # enableMeshPeer : true
      
          ## List of IP addresses at which the alertmanager service is available
          ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
          ##
          ## 配置service外部地址
          externalIPs:
            - 10.1.0.11
          loadBalancerIP: ""
          loadBalancerSourceRanges: []
          servicePort: 9093
          # nodePort: 30000
          sessionAffinity: None
          type: ClusterIP
      
      ## Monitors ConfigMap changes and POSTs to a URL
      ## Ref: https://github.com/jimmidyson/configmap-reload
      ##
      configmapReload:
        prometheus:
          ## If false, the configmap-reload container will not be deployed
          ##
          enabled: true
      
          ## configmap-reload container name
          ##
          name: configmap-reload
      
          ## configmap-reload container image
          ##
          image:
            repository: jimmidyson/configmap-reload
            tag: v0.5.0
            pullPolicy: IfNotPresent
      
          ## Additional configmap-reload container arguments
          ##
          extraArgs: {}
          ## Additional configmap-reload volume directories
          ##
          extraVolumeDirs: []
      
      
          ## Additional configmap-reload mounts
          ##
          extraConfigmapMounts: []
            # - name: prometheus-alerts
            #   mountPath: /etc/alerts.d
            #   subPath: ""
            #   configMap: prometheus-alerts
            #   readOnly: true
      
      
          ## configmap-reload resource requests and limits
          ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
          ##
          resources: {}
        alertmanager:
          ## If false, the configmap-reload container will not be deployed
          ##
          enabled: true
      
          ## configmap-reload container name
          ##
          name: configmap-reload
      
          ## configmap-reload container image
          ##
          image:
            repository: jimmidyson/configmap-reload
            tag: v0.5.0
            pullPolicy: IfNotPresent
      
          ## Additional configmap-reload container arguments
          ##
          extraArgs: {}
          ## Additional configmap-reload volume directories
          ##
          extraVolumeDirs: []
      
      
          ## Additional configmap-reload mounts
          ##
          extraConfigmapMounts: []
            # - name: prometheus-alerts
            #   mountPath: /etc/alerts.d
            #   subPath: ""
            #   configMap: prometheus-alerts
            #   readOnly: true
      
      
          ## configmap-reload resource requests and limits
          ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
          ##
          resources: {}
      
      kubeStateMetrics:
        ## If false, kube-state-metrics sub-chart will not be installed
        ##
        enabled: true
        ## 配置节点亲和性
        nodeSelector:
          kubernetes.io/hostname: prod-sys-k8s-wn3
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: kubernetes.io/resource
                      operator: In
                      values:
                        - base
        priorityClassName: "monitor-service"
        resources:
          limits:
            cpu: 1
            memory: 128Mi
          requests:
            cpu: 100m
            memory: 30Mi
        tolerations:
          - key: resource
          #   operator: "Equal|Exists"
            value: base
            effect: NoExecute
      
      ## kube-state-metrics sub-chart configurable values
      ## Please see https://github.com/kubernetes/kube-state-metrics/tree/master/charts/kube-state-metrics
      ##
      # kube-state-metrics:
      
      nodeExporter:
        ## If false, node-exporter will not be installed
        ##
        enabled: true
      
        ## If true, node-exporter pods share the host network namespace
        ##
        hostNetwork: true
      
        ## If true, node-exporter pods share the host PID namespace
        ##
        hostPID: true
      
        ## If true, node-exporter pods mounts host / at /host/root
        ##
        hostRootfs: true
      
        ## node-exporter container name
        ##
        name: node-exporter
      
        ## node-exporter container image
        ##
        image:
          repository: quay.io/prometheus/node-exporter
          tag: v1.1.2
          pullPolicy: IfNotPresent
      
        ## 优先级配置
        priorityClassName: "monitor-service"
      
        ## Specify if a Pod Security Policy for node-exporter must be created
        ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
        ##
        podSecurityPolicy:
          annotations: {}
            ## Specify pod annotations
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
            ##
            # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
            # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
            # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
      
        ## node-exporter priorityClassName
        ## 配置node-exporter优先级
        priorityClassName: "monitor-service"
      
        ## Custom Update Strategy
        ## 配置滚动更新策略
        updateStrategy:
          type: RollingUpdate
          rollingUpdate:
            maxUnavailable: 1
      
        ## Additional node-exporter container arguments
        ##
        extraArgs: {}
      
        ## Additional InitContainers to initialize the pod
        ##
        extraInitContainers: []
      
        ## Additional node-exporter hostPath mounts
        ##
        extraHostPathMounts: []
          # - name: textfile-dir
          #   mountPath: /srv/txt_collector
          #   hostPath: /var/lib/node-exporter
          #   readOnly: true
          #   mountPropagation: HostToContainer
      
        extraConfigmapMounts: []
          # - name: certs-configmap
          #   mountPath: /prometheus
          #   configMap: certs-configmap
          #   readOnly: true
      
        ## Node tolerations for node-exporter scheduling to nodes with taints
        ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
        ##
        ## 配置node-exporter污点容忍
        tolerations:
          # - key: "key"
          #   operator: "Equal|Exists"
          - operator: Exists
          #   value: "value"
          #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
      
        ## Node labels for node-exporter pod assignment
        ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
        ##
        nodeSelector: {}
      
        ## Annotations to be added to node-exporter pods
        ##
        podAnnotations: {}
      
        ## Labels to be added to node-exporter pods
        ##
        pod:
          labels: {}
      
        ## PodDisruptionBudget settings
        ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
        ##
        podDisruptionBudget:
          enabled: false
          maxUnavailable: 1
      
        ## node-exporter resource limits & requests
        ## Ref: https://kubernetes.io/docs/user-guide/compute-resources/
        ##
        ## 配置node-exporter资源限额
        resources:
          limits:
            cpu: 1
            memory: 128Mi
          requests:
            cpu: 100m
            memory: 30Mi
      
        # Custom DNS configuration to be added to node-exporter pods
        dnsConfig: {}
          # nameservers:
          #   - 1.2.3.4
          # searches:
          #   - ns1.svc.cluster-domain.example
          #   - my.dns.search.suffix
          # options:
          #   - name: ndots
          #     value: "2"
        #   - name: edns0
      
        ## Security context to be added to node-exporter pods
        ##
        securityContext:
          fsGroup: 65534
          runAsGroup: 65534
          runAsNonRoot: true
          runAsUser: 65534
      
        service:
          annotations:
            prometheus.io/scrape: "true"
          labels: {}
      
          # Exposed as a headless service:
          # https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
          clusterIP: None
      
          ## List of IP addresses at which the node-exporter service is available
          ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
          ##
          externalIPs: []
      
          hostPort: 9100
          loadBalancerIP: ""
          loadBalancerSourceRanges: []
          servicePort: 9100
          type: ClusterIP
      
      server:
        ## Prometheus server container name
        ##
        enabled: true
      
        ## Use a ClusterRole (and ClusterRoleBinding)
        ## - If set to false - we define a RoleBinding in the defined namespaces ONLY
        ##
        ## NB: because we need a Role with nonResourceURL's ("/metrics") - you must get someone with Cluster-admin privileges to define this role for you, before running with this setting enabled.
        ##     This makes prometheus work - for users who do not have ClusterAdmin privs, but wants prometheus to operate on their own namespaces, instead of clusterwide.
        ##
        ## You MUST also set namespaces to the ones you have access to and want monitored by Prometheus.
        ##
        # useExistingClusterRoleName: nameofclusterrole
      
        ## namespaces to monitor (instead of monitoring all - clusterwide). Needed if you want to run without Cluster-admin privileges.
        # namespaces:
        #   - yournamespace
      
        ## 配置prometheus容器名称
        name: server
      
        # sidecarContainers - add more containers to prometheus server
        # Key/Value where Key is the sidecar `- name: <Key>`
        # Example:
        #   sidecarContainers:
        #      webserver:
        #        image: nginx
        sidecarContainers: {}
      
        ## Prometheus server container image
        ##
        image:
          repository: quay.io/prometheus/prometheus
          tag: v2.26.0
          pullPolicy: IfNotPresent
      
        ## prometheus server priorityClassName
        ##
        priorityClassName: "monitor-service"
      
        ## EnableServiceLinks indicates whether information about services should be injected
        ## into pod's environment variables, matching the syntax of Docker links.
        ## WARNING: the field is unsupported and will be skipped in K8s prior to v1.13.0.
        ##
        enableServiceLinks: true
      
        ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug
        ## so that the various internal URLs are still able to access as they are in the default case.
        ## (Optional)
        prefixURL: ""
      
        ## External URL which can access prometheus
        ## Maybe same with Ingress host name
        baseURL: ""
      
        ## Additional server container environment variables
        ##
        ## You specify this manually like you would a raw deployment manifest.
        ## This means you can bind in environment variables from secrets.
        ##
        ## e.g. static environment variable:
        ##  - name: DEMO_GREETING
        ##    value: "Hello from the environment"
        ##
        ## e.g. secret environment variable:
        ## - name: USERNAME
        ##   valueFrom:
        ##     secretKeyRef:
        ##       name: mysecret
        ##       key: username
        env: []
      
        extraFlags:
          - web.enable-lifecycle
          ## web.enable-admin-api flag controls access to the administrative HTTP API which includes functionality such as
          ## deleting time series. This is disabled by default.
          - web.enable-admin-api
          ##
          ## storage.tsdb.no-lockfile flag controls BD locking
          # - storage.tsdb.no-lockfile
          ##
          ## storage.tsdb.wal-compression flag enables compression of the write-ahead log (WAL)
          # - storage.tsdb.wal-compression
      
        ## Path to a configuration file on prometheus server container FS
        configPath: /etc/config/prometheus.yml
      
        ### The data directory used by prometheus to set --storage.tsdb.path
        ### When empty server.persistentVolume.mountPath is used instead
        ## 配置持久化存储SC
        storagePath: ""
      
        ## Prometheus配置文件自定义
        global:
          ## How frequently to scrape targets by default
          ##
          scrape_interval: 15s
          ## How long until a scrape request times out
          ##
          scrape_timeout: 10s
          ## How frequently to evaluate rules
          ##
          evaluation_interval: 15s
        ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write
        ##
        remoteWrite: []
        ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read
        ##
        remoteRead: []
      
        ## Additional Prometheus server container arguments
        ##
        extraArgs: {}
      
        ## Additional InitContainers to initialize the pod
        ##
        extraInitContainers: []
      
        ## Additional Prometheus server Volume mounts
        ##
        extraVolumeMounts: []
      
        ## Additional Prometheus server Volumes
        ##
        extraVolumes: []
      
        ## Additional Prometheus server hostPath mounts
        ##
        extraHostPathMounts: []
          # - name: certs-dir
          #   mountPath: /etc/kubernetes/certs
          #   subPath: ""
          #   hostPath: /etc/kubernetes/certs
          #   readOnly: true
      
        extraConfigmapMounts: []
          # - name: certs-configmap
          #   mountPath: /prometheus
          #   subPath: ""
          #   configMap: certs-configmap
          #   readOnly: true
      
        ## Additional Prometheus server Secret mounts
        # Defines additional mounts with secrets. Secrets must be manually created in the namespace.
        extraSecretMounts: []
          # - name: secret-files
          #   mountPath: /etc/secrets
          #   subPath: ""
          #   secretName: prom-secret-files
          #   readOnly: true
      
        ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.server.configMapOverrideName}}
        ## Defining configMapOverrideName will cause templates/server-configmap.yaml
        ## to NOT generate a ConfigMap resource
        ##
        configMapOverrideName: ""
      
        ingress:
          ## If true, Prometheus server Ingress will be created
          ##
          enabled: false
      
          ## Prometheus server Ingress annotations
          ##
          annotations: {}
          #   kubernetes.io/ingress.class: nginx
          #   kubernetes.io/tls-acme: 'true'
      
          ## Prometheus server Ingress additional labels
          ##
          extraLabels: {}
      
          ## Prometheus server Ingress hostnames with optional path
          ## Must be provided if Ingress is enabled
          ##
          hosts: []
          #   - prometheus.domain.com
          #   - domain.com/prometheus
      
          ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
          extraPaths: []
          # - path: /*
          #   backend:
          #     serviceName: ssl-redirect
          #     servicePort: use-annotation
      
          ## Prometheus server Ingress TLS configuration
          ## Secrets must be manually created in the namespace
          ##
          tls: []
          #   - secretName: prometheus-server-tls
          #     hosts:
          #       - prometheus.domain.com
      
        ## Server Deployment Strategy type
        # strategy:
        #   type: Recreate
      
        ## hostAliases allows adding entries to /etc/hosts inside the containers
        hostAliases: []
        #   - ip: "127.0.0.1"
        #     hostnames:
        #       - "example.com"
      
        ## Node tolerations for server scheduling to nodes with taints
        ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
        ##
        ## 配置prometheus-server污点容忍
        tolerations:
          - key: resource
          #   operator: "Equal|Exists"
            value: base
          #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
            effect: NoExecute
      
        ## Node labels for Prometheus server pod assignment
        ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
        ##
        nodeSelector:
          kubernetes.io/hostname: prod-sys-k8s-wn4
      
        ## Pod affinity
        ##
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: kubernetes.io/resource
                      operator: In
                      values:
                        - base
      
        ## PodDisruptionBudget settings
        ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
        ##
        podDisruptionBudget:
          enabled: false
          maxUnavailable: 1
      
        ## Use an alternate scheduler, e.g. "stork".
        ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
        ##
        # schedulerName:
      
        persistentVolume:
          ## If true, Prometheus server will create/use a Persistent Volume Claim
          ## If false, use emptyDir
          ##
          enabled: true
      
          ## Prometheus server data Persistent Volume access modes
          ## Must match those of existing PV or dynamic provisioner
          ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
          ##
          accessModes:
            - ReadWriteOnce
      
          ## Prometheus server data Persistent Volume annotations
          ##
          annotations: {}
      
          ## Prometheus server data Persistent Volume existing claim name
          ## Requires server.persistentVolume.enabled: true
          ## If defined, PVC must be created manually before volume will be bound
          existingClaim: ""
      
          ## Prometheus server data Persistent Volume mount root path
          ##
          mountPath: /data
      
          ## Prometheus server data Persistent Volume size
          ##
          size: 500Gi
      
          ## Prometheus server data Persistent Volume Storage Class
          ## If defined, storageClassName: <storageClass>
          ## If set to "-", storageClassName: "", which disables dynamic provisioning
          ## If undefined (the default) or set to null, no storageClassName spec is
          ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
          ##   GKE, AWS & OpenStack)
          ##
          storageClass: "alicloud-disk-essd"
      
          ## Prometheus server data Persistent Volume Binding Mode
          ## If defined, volumeBindingMode: <volumeBindingMode>
          ## If undefined (the default) or set to null, no volumeBindingMode spec is
          ##   set, choosing the default mode.
          ##
          # volumeBindingMode: ""
      
          ## Subdirectory of Prometheus server data Persistent Volume to mount
          ## Useful if the volume's root directory is not empty
          ##
          subPath: ""
      
        emptyDir:
          ## Prometheus server emptyDir volume size limit
          ##
          sizeLimit: ""
      
        ## Annotations to be added to Prometheus server pods
        ##
        podAnnotations: {}
          # iam.amazonaws.com/role: prometheus
      
        ## Labels to be added to Prometheus server pods
        ##
        podLabels: {}
      
        ## Prometheus AlertManager configuration
        ##
        alertmanagers: []
      
        ## Specify if a Pod Security Policy for node-exporter must be created
        ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
        ##
        podSecurityPolicy:
          annotations: {}
            ## Specify pod annotations
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
            ##
            # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
            # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
            # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
      
        ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below)
        ##
        replicaCount: 1
      
        ## Annotations to be added to deployment
        ##
        deploymentAnnotations: {}
      
        statefulSet:
          ## If true, use a statefulset instead of a deployment for pod management.
          ## This allows to scale replicas to more than 1 pod
          ##
          enabled: false
      
          annotations: {}
          labels: {}
          podManagementPolicy: OrderedReady
      
          ## Alertmanager headless service to use for the statefulset
          ##
          headless:
            annotations: {}
            labels: {}
            servicePort: 80
            ## Enable gRPC port on service to allow auto discovery with thanos-querier
            gRPC:
              enabled: false
              servicePort: 10901
              # nodePort: 10901
      
        ## Prometheus server readiness and liveness probe initial delay and timeout
        ## Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
        ##
        readinessProbeInitialDelay: 30
        readinessProbePeriodSeconds: 5
        readinessProbeTimeout: 4
        readinessProbeFailureThreshold: 3
        readinessProbeSuccessThreshold: 1
        livenessProbeInitialDelay: 30
        livenessProbePeriodSeconds: 15
        livenessProbeTimeout: 10
        livenessProbeFailureThreshold: 3
        livenessProbeSuccessThreshold: 1
      
        ## Prometheus server resource requests and limits
        ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
        ##
        ## prometheus-server资源限额
        resources:
          limits:
            cpu: 4
            memory: 30Gi
          requests:
            cpu: 500m
            memory: 5Gi
      
        # Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico),
        # because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working
        ##
        hostNetwork: false
      
        # When hostNetwork is enabled, you probably want to set this to ClusterFirstWithHostNet
        dnsPolicy: ClusterFirst
      
        ## Vertical Pod Autoscaler config
        ## Ref: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
        verticalAutoscaler:
          ## If true a VPA object will be created for the controller (either StatefulSet or Deployemnt, based on above configs)
          enabled: false
          # updateMode: "Auto"
          # containerPolicies:
          # - containerName: 'prometheus-server'
      
        # Custom DNS configuration to be added to prometheus server pods
        dnsConfig: {}
          # nameservers:
          #   - 1.2.3.4
          # searches:
          #   - ns1.svc.cluster-domain.example
          #   - my.dns.search.suffix
          # options:
          #   - name: ndots
          #     value: "2"
        #   - name: edns0
        ## Security context to be added to server pods
        ##
        securityContext:
          runAsUser: 65534
          runAsNonRoot: true
          runAsGroup: 65534
          fsGroup: 65534
      
        service:
          annotations: {}
          labels: {}
          clusterIP: ""
      
          ## List of IP addresses at which the Prometheus server service is available
          ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
          ##
          ## 配置prometheus-server-service外部IP访问
          externalIPs:
            - 10.1.0.10
      
          loadBalancerIP: ""
          loadBalancerSourceRanges: []
          servicePort: 9090
          sessionAffinity: None
          type: ClusterIP
      
          ## Enable gRPC port on service to allow auto discovery with thanos-querier
          gRPC:
            enabled: false
            servicePort: 10901
            # nodePort: 10901
      
          ## If using a statefulSet (statefulSet.enabled=true), configure the
          ## service to connect to a specific replica to have a consistent view
          ## of the data.
          statefulsetReplica:
            enabled: false
            replica: 0
      
        ## Prometheus server pod termination grace period
        ##
        terminationGracePeriodSeconds: 300
      
        ## Prometheus data retention period (default if not specified is 15 days)
        ##
        retention: "30d"
      
      pushgateway:
        ## If false, pushgateway will not be installed
        ##
        enabled: true
      
        ## Use an alternate scheduler, e.g. "stork".
        ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
        ##
        # schedulerName:
      
        ## pushgateway container name
        ##
        name: pushgateway
      
        ## pushgateway container image
        ##
        image:
          repository: prom/pushgateway
          tag: v1.3.1
          pullPolicy: IfNotPresent
      
        ## pushgateway priorityClassName
        ##
        priorityClassName: "monitor-service"
      
        ## Additional pushgateway container arguments
        ##
        ## for example: persistence.file: /data/pushgateway.data
        extraArgs: {}
      
        ## Additional InitContainers to initialize the pod
        ##
        extraInitContainers: []
      
        ingress:
          ## If true, pushgateway Ingress will be created
          ##
          enabled: false
      
          ## pushgateway Ingress annotations
          ##
          annotations: {}
          #   kubernetes.io/ingress.class: nginx
          #   kubernetes.io/tls-acme: 'true'
      
          ## pushgateway Ingress hostnames with optional path
          ## Must be provided if Ingress is enabled
          ##
          hosts: []
          #   - pushgateway.domain.com
          #   - domain.com/pushgateway
      
          ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
          extraPaths: []
          # - path: /*
          #   backend:
          #     serviceName: ssl-redirect
          #     servicePort: use-annotation
      
          ## pushgateway Ingress TLS configuration
          ## Secrets must be manually created in the namespace
          ##
          tls: []
          #   - secretName: prometheus-alerts-tls
          #     hosts:
          #       - pushgateway.domain.com
      
        ## Node tolerations for pushgateway scheduling to nodes with taints
        ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
        ##
        ## 配置PushGateway污点容忍
        tolerations:
          - key: resource
          #   operator: "Equal|Exists"
            value: base
          #   effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
            effect: NoExecute
      
        ## Node labels for pushgateway pod assignment
        ## Ref: https://kubernetes.io/docs/user-guide/node-selection/
        ##
        nodeSelector: {}
      
        ## 配置节点亲和性
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: kubernetes.io/resource
                      operator: In
                      values:
                        - base
      
        ## Annotations to be added to pushgateway pods
        ##
        podAnnotations: {}
      
        ## Labels to be added to pushgateway pods
        ##
        podLabels: {}
      
        ## Specify if a Pod Security Policy for node-exporter must be created
        ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
        ##
        podSecurityPolicy:
          annotations: {}
            ## Specify pod annotations
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
            ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
            ##
            # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
            # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
            # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
      
        replicaCount: 1
      
        ## Annotations to be added to deployment
        ##
        deploymentAnnotations: {}
      
        ## PodDisruptionBudget settings
        ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
        ##
        podDisruptionBudget:
          enabled: false
          maxUnavailable: 1
      
        ## pushgateway resource requests and limits
        ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
        ##
        ## 配置PushGateway资源限额
        resources:
          limits:
            cpu: 10m
            memory: 32Mi
          requests:
            cpu: 10m
            memory: 32Mi
      
        # Custom DNS configuration to be added to push-gateway pods
        dnsConfig: {}
          # nameservers:
          #   - 1.2.3.4
          # searches:
          #   - ns1.svc.cluster-domain.example
          #   - my.dns.search.suffix
          # options:
          #   - name: ndots
          #     value: "2"
        #   - name: edns0
      
        ## Security context to be added to push-gateway pods
        ##
        securityContext:
          runAsUser: 65534
          runAsNonRoot: true
      
        service:
          annotations:
            prometheus.io/probe: pushgateway
          labels: {}
          clusterIP: ""
      
          ## List of IP addresses at which the pushgateway service is available
          ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
          ##
          externalIPs: []
      
          loadBalancerIP: ""
          loadBalancerSourceRanges: []
          servicePort: 9091
          type: ClusterIP
      
        ## pushgateway Deployment Strategy type
        # strategy:
        #   type: Recreate
      
        persistentVolume:
          ## If true, pushgateway will create/use a Persistent Volume Claim
          ##
          enabled: false
      
          ## pushgateway data Persistent Volume access modes
          ## Must match those of existing PV or dynamic provisioner
          ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
          ##
          accessModes:
            - ReadWriteOnce
      
          ## pushgateway data Persistent Volume Claim annotations
          ##
          annotations: {}
      
          ## pushgateway data Persistent Volume existing claim name
          ## Requires pushgateway.persistentVolume.enabled: true
          ## If defined, PVC must be created manually before volume will be bound
          existingClaim: ""
      
          ## pushgateway data Persistent Volume mount root path
          ##
          mountPath: /data
      
          ## pushgateway data Persistent Volume size
          ##
          size: 2Gi
      
          ## pushgateway data Persistent Volume Storage Class
          ## If defined, storageClassName: <storageClass>
          ## If set to "-", storageClassName: "", which disables dynamic provisioning
          ## If undefined (the default) or set to null, no storageClassName spec is
          ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
          ##   GKE, AWS & OpenStack)
          ##
          # storageClass: "-"
      
          ## pushgateway data Persistent Volume Binding Mode
          ## If defined, volumeBindingMode: <volumeBindingMode>
          ## If undefined (the default) or set to null, no volumeBindingMode spec is
          ##   set, choosing the default mode.
          ##
          # volumeBindingMode: ""
      
          ## Subdirectory of pushgateway data Persistent Volume to mount
          ## Useful if the volume's root directory is not empty
          ##
          subPath: ""
      
      
      ## alertmanager ConfigMap entries
      ##
      alertmanagerFiles:
        alertmanager.yml:
          global: {}
            # slack_api_url: ''
      
          receivers:
            - name: default-receiver
              # slack_configs:
              #  - channel: '@you'
              #    send_resolved: true
      
          route:
            group_wait: 10s
            group_interval: 5m
            receiver: default-receiver
            repeat_interval: 1h
      
      ## Prometheus server ConfigMap entries
      ##
      serverFiles:
      
        ## Alerts configuration
        ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
        alerting_rules.yml: {}
        # groups:
        #   - name: Instances
        #     rules:
        #       - alert: InstanceDown
        #         expr: up == 0
        #         for: 5m
        #         labels:
        #           severity: page
        #         annotations:
        #           description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
        #           summary: 'Instance {{ $labels.instance }} down'
        ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use alerting_rules.yml
        alerts: {}
      
        ## Records configuration
        ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
        recording_rules.yml: {}
        ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use recording_rules.yml
        rules: {}
      
        prometheus.yml:
          rule_files:
            - /etc/config/recording_rules.yml
            - /etc/config/alerting_rules.yml
          ## Below two files are DEPRECATED will be removed from this default values file
            - /etc/config/rules
            - /etc/config/alerts
      
          scrape_configs:
            - job_name: prometheus
              static_configs:
                - targets:
                  - localhost:9090
      
            # A scrape configuration for running Prometheus on a Kubernetes cluster.
            # This uses separate scrape configs for cluster components (i.e. API server, node)
            # and services to allow each to use different authentication configs.
            #
            # Kubernetes labels will be added as Prometheus labels on metrics via the
            # `labelmap` relabeling action.
      
            # Scrape config for API servers.
            #
            # Kubernetes exposes API servers as endpoints to the default/kubernetes
            # service so this uses `endpoints` role and uses relabelling to only keep
            # the endpoints associated with the default/kubernetes service using the
            # default named port `https`. This works for single API server deployments as
            # well as HA API server deployments.
            - job_name: 'kubernetes-apiservers'
      
              kubernetes_sd_configs:
                - role: endpoints
      
              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https
      
              # This TLS & bearer token file config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                # If your node certificates are self-signed or use a different CA to the
                # master CA, then disable certificate verification below. Note that
                # certificate verification is an integral part of a secure infrastructure
                # so this should only be disabled in a controlled environment. You can
                # disable certificate verification by uncommenting the line below.
                #
                insecure_skip_verify: true
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
              # Keep only the default/kubernetes service endpoints for the https port. This
              # will add targets for each API server which Kubernetes adds an endpoint to
              # the default/kubernetes service.
              relabel_configs:
                - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
                  action: keep
                  regex: default;kubernetes;https
      
            - job_name: 'kubernetes-nodes'
      
              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https
      
              # This TLS & bearer token file config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                # If your node certificates are self-signed or use a different CA to the
                # master CA, then disable certificate verification below. Note that
                # certificate verification is an integral part of a secure infrastructure
                # so this should only be disabled in a controlled environment. You can
                # disable certificate verification by uncommenting the line below.
                #
                insecure_skip_verify: true
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
              kubernetes_sd_configs:
                - role: node
      
              relabel_configs:
                - action: labelmap
                  regex: __meta_kubernetes_node_label_(.+)
                - target_label: __address__
                  replacement: kubernetes.default.svc:443
                - source_labels: [__meta_kubernetes_node_name]
                  regex: (.+)
                  target_label: __metrics_path__
                  replacement: /api/v1/nodes/$1/proxy/metrics
      
      
            - job_name: 'kubernetes-nodes-cadvisor'
      
              # Default to scraping over https. If required, just disable this or change to
              # `http`.
              scheme: https
      
              # This TLS & bearer token file config is used to connect to the actual scrape
              # endpoints for cluster components. This is separate to discovery auth
              # configuration because discovery & scraping are two separate concerns in
              # Prometheus. The discovery auth config is automatic if Prometheus runs inside
              # the cluster. Otherwise, more config options have to be provided within the
              # <kubernetes_sd_config>.
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                # If your node certificates are self-signed or use a different CA to the
                # master CA, then disable certificate verification below. Note that
                # certificate verification is an integral part of a secure infrastructure
                # so this should only be disabled in a controlled environment. You can
                # disable certificate verification by uncommenting the line below.
                #
                insecure_skip_verify: true
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
              kubernetes_sd_configs:
                - role: node
      
              # This configuration will work only on kubelet 1.7.3+
              # As the scrape endpoints for cAdvisor have changed
              # if you are using older version you need to change the replacement to
              # replacement: /api/v1/nodes/$1:4194/proxy/metrics
              # more info here https://github.com/coreos/prometheus-operator/issues/633
              relabel_configs:
                - action: labelmap
                  regex: __meta_kubernetes_node_label_(.+)
                - target_label: __address__
                  replacement: kubernetes.default.svc:443
                - source_labels: [__meta_kubernetes_node_name]
                  regex: (.+)
                  target_label: __metrics_path__
                  replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
      
            # Scrape config for service endpoints.
            #
            # The relabeling allows the actual service scrape endpoint to be configured
            # via the following annotations:
            #
            # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
            # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
            # to set this to `https` & most likely set the `tls_config` of the scrape config.
            # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
            # * `prometheus.io/port`: If the metrics are exposed on a different port to the
            # service then set this appropriately.
            - job_name: 'kubernetes-service-endpoints'
      
              kubernetes_sd_configs:
                - role: endpoints
      
              relabel_configs:
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
                  action: keep
                  regex: true
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
                  action: replace
                  target_label: __scheme__
                  regex: (https?)
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
                  action: replace
                  target_label: __metrics_path__
                  regex: (.+)
                - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
                  action: replace
                  target_label: __address__
                  regex: ([^:]+)(?::d+)?;(d+)
                  replacement: $1:$2
                - action: labelmap
                  regex: __meta_kubernetes_service_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: kubernetes_namespace
                - source_labels: [__meta_kubernetes_service_name]
                  action: replace
                  target_label: kubernetes_name
                - source_labels: [__meta_kubernetes_pod_node_name]
                  action: replace
                  target_label: kubernetes_node
      
            # Scrape config for slow service endpoints; same as above, but with a larger
            # timeout and a larger interval
            #
            # The relabeling allows the actual service scrape endpoint to be configured
            # via the following annotations:
            #
            # * `prometheus.io/scrape-slow`: Only scrape services that have a value of `true`
            # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
            # to set this to `https` & most likely set the `tls_config` of the scrape config.
            # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
            # * `prometheus.io/port`: If the metrics are exposed on a different port to the
            # service then set this appropriately.
            - job_name: 'kubernetes-service-endpoints-slow'
      
              scrape_interval: 5m
              scrape_timeout: 30s
      
              kubernetes_sd_configs:
                - role: endpoints
      
              relabel_configs:
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
                  action: keep
                  regex: true
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
                  action: replace
                  target_label: __scheme__
                  regex: (https?)
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
                  action: replace
                  target_label: __metrics_path__
                  regex: (.+)
                - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
                  action: replace
                  target_label: __address__
                  regex: ([^:]+)(?::d+)?;(d+)
                  replacement: $1:$2
                - action: labelmap
                  regex: __meta_kubernetes_service_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: kubernetes_namespace
                - source_labels: [__meta_kubernetes_service_name]
                  action: replace
                  target_label: kubernetes_name
                - source_labels: [__meta_kubernetes_pod_node_name]
                  action: replace
                  target_label: kubernetes_node
      
            - job_name: 'prometheus-pushgateway'
              honor_labels: true
      
              kubernetes_sd_configs:
                - role: service
      
              relabel_configs:
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
                  action: keep
                  regex: pushgateway
      
            # Example scrape config for probing services via the Blackbox Exporter.
            #
            # The relabeling allows the actual service scrape endpoint to be configured
            # via the following annotations:
            #
            # * `prometheus.io/probe`: Only probe services that have a value of `true`
            - job_name: 'kubernetes-services'
      
              metrics_path: /probe
              params:
                module: [http_2xx]
      
              kubernetes_sd_configs:
                - role: service
      
              relabel_configs:
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
                  action: keep
                  regex: true
                - source_labels: [__address__]
                  target_label: __param_target
                - target_label: __address__
                  replacement: blackbox
                - source_labels: [__param_target]
                  target_label: instance
                - action: labelmap
                  regex: __meta_kubernetes_service_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  target_label: kubernetes_namespace
                - source_labels: [__meta_kubernetes_service_name]
                  target_label: kubernetes_name
      
            # Example scrape config for pods
            #
            # The relabeling allows the actual pod scrape endpoint to be configured via the
            # following annotations:
            #
            # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
            # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
            # to set this to `https` & most likely set the `tls_config` of the scrape config.
            # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
            # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`.
            - job_name: 'kubernetes-pods'
      
              kubernetes_sd_configs:
                - role: pod
      
              relabel_configs:
                - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                  action: keep
                  regex: true
                - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
                  action: replace
                  regex: (https?)
                  target_label: __scheme__
                - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                  action: replace
                  target_label: __metrics_path__
                  regex: (.+)
                - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                  action: replace
                  regex: ([^:]+)(?::d+)?;(d+)
                  replacement: $1:$2
                  target_label: __address__
                - action: labelmap
                  regex: __meta_kubernetes_pod_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: kubernetes_namespace
                - source_labels: [__meta_kubernetes_pod_name]
                  action: replace
                  target_label: kubernetes_pod_name
                - source_labels: [__meta_kubernetes_pod_phase]
                  regex: Pending|Succeeded|Failed
                  action: drop
      
            # Example Scrape config for pods which should be scraped slower. An useful example
            # would be stackriver-exporter which queries an API on every scrape of the pod
            #
            # The relabeling allows the actual pod scrape endpoint to be configured via the
            # following annotations:
            #
            # * `prometheus.io/scrape-slow`: Only scrape pods that have a value of `true`
            # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
            # to set this to `https` & most likely set the `tls_config` of the scrape config.
            # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
            # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`.
            - job_name: 'kubernetes-pods-slow'
      
              scrape_interval: 5m
              scrape_timeout: 30s
      
              kubernetes_sd_configs:
                - role: pod
      
              relabel_configs:
                - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow]
                  action: keep
                  regex: true
                - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
                  action: replace
                  regex: (https?)
                  target_label: __scheme__
                - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                  action: replace
                  target_label: __metrics_path__
                  regex: (.+)
                - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                  action: replace
                  regex: ([^:]+)(?::d+)?;(d+)
                  replacement: $1:$2
                  target_label: __address__
                - action: labelmap
                  regex: __meta_kubernetes_pod_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: kubernetes_namespace
                - source_labels: [__meta_kubernetes_pod_name]
                  action: replace
                  target_label: kubernetes_pod_name
                - source_labels: [__meta_kubernetes_pod_phase]
                  regex: Pending|Succeeded|Failed
                  action: drop
      
      # adds additional scrape configs to prometheus.yml
      # must be a string so you have to add a | after extraScrapeConfigs:
      # example adds prometheus-blackbox-exporter scrape config
      extraScrapeConfigs:
        # - job_name: 'prometheus-blackbox-exporter'
        #   metrics_path: /probe
        #   params:
        #     module: [http_2xx]
        #   static_configs:
        #     - targets:
        #       - https://example.com
        #   relabel_configs:
        #     - source_labels: [__address__]
        #       target_label: __param_target
        #     - source_labels: [__param_target]
        #       target_label: instance
        #     - target_label: __address__
        #       replacement: prometheus-blackbox-exporter:9115
      
      # Adds option to add alert_relabel_configs to avoid duplicate alerts in alertmanager
      # useful in H/A prometheus with different external labels but the same alerts
      alertRelabelConfigs:
        # alert_relabel_configs:
        # - source_labels: [dc]
        #   regex: (.+)d+
        #   target_label: dc
      
      networkPolicy:
        ## Enable creation of NetworkPolicy resources.
        ##
        enabled: false
      
      # Force namespace of namespaced resources
      forceNamespace: null
    5. 访问Prometheus & Alertmanager后台管理界面
      <root@PROD-K8S-CP1 ~># kubectl describe svc prometheus-prometheus-server 
      Name:              prometheus-prometheus-server
      Namespace:         default
      Labels:            app=prometheus
                         app.kubernetes.io/managed-by=Helm
                         chart=prometheus-14.6.0
                         component=prometheus-server
                         heritage=Helm
                         release=prometheus
      Annotations:       meta.helm.sh/release-name: prometheus
                         meta.helm.sh/release-namespace: default
      Selector:          app=prometheus,component=prometheus-server,release=prometheus
      Type:              ClusterIP
      IP:                10.12.0.20
      External IPs:      10.1.0.10
      Port:              http  9090/TCP
      TargetPort:        9090/TCP
      Endpoints:         172.21.3.157:9090
      Session Affinity:  None
      Events:            <none>
      <root@PROD-K8S-CP1 ~># kubectl describe svc prometheus-alertmanager 
      Name:              prometheus-alertmanager
      Namespace:         default
      Labels:            app=prometheus
                         app.kubernetes.io/managed-by=Helm
                         chart=prometheus-14.6.0
                         component=alertmanager
                         heritage=Helm
                         release=prometheus
      Annotations:       meta.helm.sh/release-name: prometheus
                         meta.helm.sh/release-namespace: default
      Selector:          app=prometheus,component=alertmanager,release=prometheus
      Type:              ClusterIP
      IP:                10.12.0.13
      External IPs:      10.1.0.11
      Port:              http  9093/TCP
      TargetPort:        9093/TCP
      Endpoints:         172.21.3.251:9093
      Session Affinity:  None
      Events:            <none>
    6. 访问方式 10.1.0.10:80
    7. 修改时区 lens直接修改
                  - name: localtime
                    mountPath: /etc/localtime
      
      ----------------------------------------------
      
              - name: localtime
                hostPath:
                  path: /etc/localtime
                  type: ''
    8. 注意修改Prometheus-server的滚动更新策略,最大不可用设置为1,否则会出现DB锁定错误
        strategy:
          type: RollingUpdate
          rollingUpdate:
            maxUnavailable: 1
    9. 如果手动查看节点expose的metrics
      <root@PRE-K8S-CP1 ~># curl -ik https://10.1.0.233:6443/metrics
      HTTP/1.1 403 Forbidden
      Cache-Control: no-cache, private
      Content-Type: application/json
      X-Content-Type-Options: nosniff
      Date: Wed, 02 Jun 2021 03:40:55 GMT
      Content-Length: 240
      
      {
      "kind": "Status",
      "apiVersion": "v1",
      "metadata": {
      
      },
      "status": "Failure",
      "message": "forbidden: User "system:anonymous" cannot get path "/metrics"",
      "reason": "Forbidden",
      "details": {
      
      },
      "code": 403
      }
      
      
      解决方案如下,是因为默认情况下,Kubernetes不允许未经授权下不能访问集群资源信息
      
      kubectl create clusterrolebinding prometheus-admin --clusterrole cluster-admin --user system:anonymous

    Prometheus配置文件


    reable_action参数解释

    • replace 默认,通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组
    • keep 删除regex与连接不匹配的目标 source_labels
    • drop 删除regex与连接匹配的目标 source_labels
    • labelmap 匹配regex所有标签名称。然后复制匹配标签的值进行分组,replacement分组引用(${1},${2},…)替代
    • labeldrop 删除regex匹配的标签
    • labelkeep 删除regex不匹配的标签

    默认配置文件

    job_name一般用来定义收集什么类型的metrics,下面的job_name定义了收集的是 

    1. api-server的性能指标,具体的配置解释如下
      - job_name: kubernetes-apiservers
        honor_timestamps: true
        scrape_interval: 1m
        scrape_timeout: 10s
        metrics_path: /metrics
        scheme: https
        authorization:
          type: Bearer
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        follow_redirects: true
        relabel_configs:   ## 标签重新定义
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]  ##定义原标签的名称作为匹配的条件
          separator: ; ## 分隔符的定义,用于分隔源标签值之间的符号
          regex: default;kubernetes;https ## 匹配原标签source_lable的值,比如原标签是__meta_kubernetes_service_name="kubernetes"  __meta_kubernetes_namespace="default" __meta_kubernetes_endpoint_port_name="https" 匹配规则需要匹配到source_lable为“__meta_kubernetes_endpoint_port_name”的值是"https"这个endpoints,这里的endpoints就是一个监控点,不要与kubernetes的endpoints混淆
          replacement: $1 ##默认操作
          action: keep ## 删除regex与source_lable的不匹配的标签,并保留匹配的regex
        kubernetes_sd_configs:
        - role: endpoints
          follow_redirects: true
    2. 节点性能指标采集job_name kubernetes_nodes
      - job_name: kubernetes-nodes
        honor_timestamps: true
        scrape_interval: 1m
        scrape_timeout: 10s
        metrics_path: /metrics
        scheme: https
        authorization:
          type: Bearer
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        follow_redirects: true
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap ##保留__meta_kubernetes_node_label_(.+)中的".+"的值
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: kubernetes.default.svc:443
          action: replace ## 新增__address标签并且值替换为replacement的值
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+) ##匹配source_lable任意的值(实际就是node节点的名称)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/$1/proxy/metrics
          action: replace ##替换面的$1就是把regex匹配到的node节点的名称的值
        kubernetes_sd_configs:
        - role: node
          follow_redirects: true
      
      整段的配置意思就是修改了默认的__address__的值,第一段将IP修改成域名形式,第二配置将默认的/metrics修改成/api/v1/nodes/$1/proxy/metrics,整合起来最终的值就是 https://kubernetes.default.svc/api/v1/nodes/pre-k8s-cp3/proxy/metrics
    3. 节点容器指标采集说明,job_name kubernetes_nodes_cadvisor
      - job_name: kubernetes-nodes-cadvisor
        honor_timestamps: true
        scrape_interval: 10s
        scrape_timeout: 10s
        metrics_path: /metrics
        scheme: https
        authorization:
          type: Bearer
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        follow_redirects: true
        relabel_configs:
        - separator: ;
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
          action: labelmap
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: kubernetes.default.svc:443
          action: replace
        - source_labels: [__meta_kubernetes_node_name]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          # 与节点性能指标采集唯一不同之处,URI路径不一样
          replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
          action: replace
        kubernetes_sd_configs:
        - role: node
          follow_redirects: true
    4. endpoints的service的监控配置(配置在service中)
      apiVersion: v1
      kind: Service
      metadata:
        name: prometheus-kube-state-metrics
        namespace: default
        selfLink: /api/v1/namespaces/default/services/prometheus-kube-state-metrics
        uid: 161fff0c-fffe-4561-90ba-0bec0608fbe4
        resourceVersion: '23976299'
        creationTimestamp: '2021-06-01T06:51:21Z'
        labels:
          app.kubernetes.io/instance: prometheus
          app.kubernetes.io/managed-by: Helm
          app.kubernetes.io/name: kube-state-metrics
          helm.sh/chart: kube-state-metrics-3.1.0
        annotations:
          meta.helm.sh/release-name: prometheus
          meta.helm.sh/release-namespace: default
          prometheus.io/scrape: 'true'
      - job_name: kubernetes-service-endpoints
        honor_timestamps: true
        scrape_interval: 1m
        scrape_timeout: 10s
        metrics_path: /metrics
        scheme: http
        follow_redirects: true
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          separator: ;
          regex: "true"
          replacement: $1
          action: keep ##注意这段配置,相当重要,直接决定这个job监控那到那些endpoints监控点,这一段意思是保留匹配__meta_kubernetes_service_annotation_prometheus_io_scrape的值为true的endpoints,并删除没有匹配到标签__meta_kubernetes_service_annotation_prometheus_io_scrape且值为true的endpoints
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          separator: ;
          regex: (https?)
          target_label: __scheme__
          replacement: $1
          action: replace  ##将__scheme__原值替换为regex的值
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: $1
          action: replace ##source_lable __meta_kubernetes_service_annotation_prometheus_io_path的值赋予__metrics_path__
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          separator: ;
          regex: ([^:]+)(?::d+)?;(d+)
          target_label: __address__
          replacement: $1:$2
          action: replace ## 将匹配到的__address__ __meta_kubernetes_service_annotation_prometheus_io_port的标签,根据regex的规则匹配到的值赋予__address__
        - separator: ;
          regex: __meta_kubernetes_service_label_(.+)
          replacement: $1
          action: labelmap ##保留.+的值为新增的label
        - source_labels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          target_label: kubernetes_namespace
          replacement: $1
          action: replace ## 替换标签值,同上
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: kubernetes_name
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_pod_node_name]
          separator: ;
          regex: (.*)
          target_label: kubernetes_node
          replacement: $1
          action: replace
        kubernetes_sd_configs:
        - role: endpoints
          follow_redirects: true
    5. service监控配置,http_2xx
      - job_name: kubernetes-services
        honor_timestamps: true
        params:
          module:
          - http_2xx ##引入的模块
        scrape_interval: 1m
        scrape_timeout: 10s
        metrics_path: /probe
        scheme: http
        follow_redirects: true
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
          separator: ;
          regex: "true"
          replacement: $1
          action: keep ## 保留匹配__meta_kubernetes_service_annotation_prometheus_io_probe并且删除没有匹配到__meta_kubernetes_service_annotation_prometheus_io_probe的endpoints
        - source_labels: [__address__]
          separator: ;
          regex: (.*)
          target_label: __param_target
          replacement: $1
          action: replace ## 将regex的的值赋予__param_target
        - separator: ;
          regex: (.*)
          target_label: __address__
          replacement: blackbox
          action: replace
        - source_labels: [__param_target]
          separator: ;
          regex: (.*)
          target_label: instance
          replacement: $1
          action: replace ##经过上面标签重标,最终target_label instance的值就是上面的__address__的值
        - separator: ;
          regex: __meta_kubernetes_service_label_(.+)
          replacement: $1
          action: labelmap
        - source_labels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          target_label: kubernetes_namespace
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: kubernetes_name
          replacement: $1
          action: replace
        kubernetes_sd_configs:
        - role: service
          follow_redirects: true
    6. Pod监控配置
      - job_name: kubernetes-pods
        honor_timestamps: true
        scrape_interval: 1m
        scrape_timeout: 10s
        metrics_path: /metrics
        scheme: http
        follow_redirects: true
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          separator: ;
          regex: "true"
          replacement: $1
          action: keep #
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
          separator: ;
          regex: (https?)
          target_label: __scheme__
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          separator: ;
          regex: (.+)
          target_label: __metrics_path__
          replacement: $1
          action: replace
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          separator: ;
          regex: ([^:]+)(?::d+)?;(d+)
          target_label: __address__
          replacement: $1:$2
          action: replace
        - separator: ;
          regex: __meta_kubernetes_pod_label_(.+)
          replacement: $1
          action: labelmap
        - source_labels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          target_label: kubernetes_namespace
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_pod_name]
          separator: ;
          regex: (.*)
          target_label: kubernetes_pod_name
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_pod_phase]
          separator: ;
          regex: Pending|Succeeded|Failed
          replacement: $1
          action: drop
        kubernetes_sd_configs:
        - role: pod
          follow_redirects: true

     JVM RabbitMQ监控


    1. JVM监控(需要在deployment.yaml指定prometheus_io_scrape标签)如下

      spec:
        replicas: 1
        selector:
          matchLabels:
            app: pre-common-gateway
            component: spring
            part-of: pre
            tier: backend
        template:
          metadata:
            creationTimestamp: null
            labels:
              app: pre-common-gateway
              component: spring
              part-of: pre
              tier: backend
            annotations: # 新增监控注释说明
              prometheus.io/port: '8888'
              prometheus.io/scrape: 'true'
        #############  Jvm监控   ##############
        - job_name: kubernetes-pods-jvm
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - action: keep
              regex: true
              source_labels:
                - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
                - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
                - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::d+)?;(d+)
              replacement: $1:$2
              source_labels:
                - __address__
                - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels: 
              - __meta_kubernetes_pod_container_name
              target_label: kubernetes_container_name
            - action: replace
              source_labels:
                - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
                - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed
              source_labels:
                - __meta_kubernetes_pod_phase
          scrape_interval: 10s
          scrape_timeout: 10s
    2. RabbitMQ监控
      以下的配置仅供参考,创建后的状态
      apiVersion: v1
      kind: Service
      metadata:
        name: pre-rabbitmq-monitor
        namespace: pre
        annotations:
          prometheus.io/scrape: rabbitmq
      spec:
        ports:
          - name: rabbitmq-exporter
            protocol: TCP
            port: 9419
            targetPort: 9419
          - name: rabbitmq-prometheus-port
            protocol: TCP
            port: 15692
            targetPort: 15692
        selector:
          app: pre-rabbitmq
        clusterIP: 10.11.0.90
        type: ClusterIP
        sessionAffinity: None
        #############  rabbitmq监控   ##############
        - job_name: kubernetes-service-rabbitmq
          kubernetes_sd_configs:
            - role: endpoints
          relabel_configs:
            # 新增自定义监控标签rabbitmq
            - action: keep
              regex: rabbitmq
              source_labels:
                - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
                - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
                - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            # 删除不需要监控的端口
            - action: drop
              regex: (5672|15672)
              source_labels:
                - __meta_kubernetes_pod_container_port_number
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: labelmap
              regex: __meta_kubernetes_pod_label_statefulset_kubernetes_io_(.+)
            - action: replace
              source_labels:
                - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
                - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
                - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
    3. 参考rabbitmq-exporter.yaml配置(lens直接修改)
              - name: rabbitmq-exporter
                image: hub.qiangyun.com/rabbitmq-exporter
                ports:
                  - name: mq-monitor
                    containerPort: 9419
                    protocol: TCP
                env:
                  - name: RABBIT_USER
                    value: guest
                  - name: RABBIT_PASSWORD
                    value: guest
                  - name: RABBIT_CAPABILITIES
                    value: bert
                resources:
                  limits:
                    cpu: 500m
                    memory: 1Gi
                livenessProbe:
                  httpGet:
                    path: /metrics
                    port: 9419
                    scheme: HTTP
                  initialDelaySeconds: 60
                  timeoutSeconds: 15
                  periodSeconds: 60
                  successThreshold: 1
                  failureThreshold: 3
                readinessProbe:
                  httpGet:
                    path: /metrics
                    port: 9419
                    scheme: HTTP
                  initialDelaySeconds: 20
                  timeoutSeconds: 10
                  periodSeconds: 60
                  successThreshold: 1
                  failureThreshold: 3
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                imagePullPolicy: IfNotPresent
    4. rabbitmq-monitor-service.yaml
      apiVersion: v1
      kind: Service
      metadata:
        annotations:
          # 注意以下的配置与下面Prometheus的配置相呼应
          prometheus.io/scrape: rabbitmq
        name: rabbitmq-monitor
        namespace: prod
      spec:
        ports:
          - name: rabbitmq-exporter
            port: 9419
            protocol: TCP
            targetPort: 9419
          - name: rabbitmq-prometheus-port
            port: 15692
            protocol: TCP
            targetPort: 15692
        selector:
          app: rabbitmq
        type: ClusterIP

    最终参考配置


    • Prometheus完整配置

      global:
        evaluation_interval: 15s
        scrape_interval: 15s
        scrape_timeout: 10s
      rule_files:
      - /etc/config/recording_rules.yml
      - /etc/config/alerting_rules.yml
      - /etc/config/rules
      - /etc/config/alerts
      scrape_configs:
      - job_name: prometheus
        static_configs:
        - targets:
          - localhost:9090
      - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
      ##############  Kubernetes apiserver 监控配置 ##############
        job_name: kubernetes-apiservers
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - action: keep
          regex: default;kubernetes;https
          source_labels:
          - __meta_kubernetes_namespace
          - __meta_kubernetes_service_name
          - __meta_kubernetes_endpoint_port_name
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
      - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
      ##############  Kubernetes node 性能指标监控配置 ##############
        job_name: kubernetes-nodes
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - replacement: kubernetes.default.svc:443
          target_label: __address__
        - regex: (.+)
          replacement: /api/v1/nodes/$1/proxy/metrics
          source_labels:
          - __meta_kubernetes_node_name
          target_label: __metrics_path__
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
      - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
      ##############  Kubernetes 节点Pod性能指标监控配置 ##############
        job_name: kubernetes-nodes-cadvisor
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - replacement: kubernetes.default.svc:443
          target_label: __address__
        - regex: (.+)
          replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
          source_labels:
          - __meta_kubernetes_node_name
          target_label: __metrics_path__
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
      
      ##############  Kubernetes service endpoints 监控配置 ##############
      - job_name: kubernetes-service-endpoints
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_scrape
        - action: replace
          regex: (https?)
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_scheme
          target_label: __scheme__
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::d+)?;(d+)
          replacement: $1:$2
          source_labels:
          - __address__
          - __meta_kubernetes_service_annotation_prometheus_io_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: kubernetes_namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_service_name
          target_label: kubernetes_name
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_node_name
          target_label: kubernetes_node
      
      ##############  Kubernetes service endpoints RabbitMQ 监控配置 ##############
      - job_name: kubernetes-service-rabbitmq
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - action: keep
          regex: rabbitmq
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_scrape
        - action: keep
          regex: (15692|9419)
          source_labels:
          - __meta_kubernetes_pod_container_port_number
        - action: replace
          regex: (https?)
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_scheme
          target_label: __scheme__
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::d+)?;(d+)
          replacement: $1:$2
          source_labels:
          - __address__
          - __meta_kubernetes_service_annotation_prometheus_io_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: kubernetes_namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_service_name
          target_label: kubernetes_name
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_node_name
          target_label: kubernetes_node
      
      ##############  暂时没有理解这个 slow鬼 监控配置 ##############
      - job_name: kubernetes-service-endpoints-slow
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
        - action: replace
          regex: (https?)
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_scheme
          target_label: __scheme__
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::d+)?;(d+)
          replacement: $1:$2
          source_labels:
          - __address__
          - __meta_kubernetes_service_annotation_prometheus_io_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: kubernetes_namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_service_name
          target_label: kubernetes_name
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_node_name
          target_label: kubernetes_node
        scrape_interval: 5m
        scrape_timeout: 30s
      - honor_labels: true
      
      ##############  Prometheus pushgateway 监控配置 ##############
        job_name: prometheus-pushgateway
        kubernetes_sd_configs:
        - role: service
        relabel_configs:
        - action: keep
          regex: pushgateway
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_probe
      
      ##############  Kubernetes service http_2xx 监控配置 ##############
      - job_name: kubernetes-services
        kubernetes_sd_configs:
        - role: service
        metrics_path: /probe
        params:
          module:
          - http_2xx
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_service_annotation_prometheus_io_probe
        - source_labels:
          - __address__
          target_label: __param_target
        - replacement: blackbox
          target_label: __address__
        - source_labels:
          - __param_target
          target_label: instance
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels:
          - __meta_kubernetes_namespace
          target_label: kubernetes_namespace
        - source_labels:
          - __meta_kubernetes_service_name
          target_label: kubernetes_name
      
      ##############  Kubernetes 用户应用 Pod 自定义监控配置 ##############
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
        - action: replace
          regex: (https?)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scheme
          target_label: __scheme__
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::d+)?;(d+)
          replacement: $1:$2
          source_labels:
          - __address__
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: kubernetes_namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_name
          target_label: kubernetes_pod_name
        - action: drop
          regex: Pending|Succeeded|Failed
          source_labels:
          - __meta_kubernetes_pod_phase
      
      ##############  Kubernetes 用户应用 Pod JVM 监控配置 ##############
      - job_name: kubernetes-pods-jvm
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_jvm_scrape
        - action: replace
          regex: (https?)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scheme
          target_label: __scheme__
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::d+)?;(d+)
          replacement: $1:$2
          source_labels:
          - __address__
          - __meta_kubernetes_pod_annotation_prometheus_io_jvm_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: kubernetes_namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_name
          target_label: kubernetes_pod_name
        - action: drop
          regex: Pending|Succeeded|Failed
          source_labels:
          - __meta_kubernetes_pod_phase
      
      ##############  暂时没有理解这个 slow鬼 监控配置 ##############
      - job_name: kubernetes-pods-slow
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
        - action: replace
          regex: (https?)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scheme
          target_label: __scheme__
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::d+)?;(d+)
          replacement: $1:$2
          source_labels:
          - __address__
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: kubernetes_namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_name
          target_label: kubernetes_pod_name
        - action: drop
          regex: Pending|Succeeded|Failed
          source_labels:
          - __meta_kubernetes_pod_phase
        scrape_interval: 5m
        scrape_timeout: 30s
      alerting:
        alertmanagers:
        - kubernetes_sd_configs:
            - role: pod
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace]
            regex: default
            action: keep
          - source_labels: [__meta_kubernetes_pod_label_app]
            regex: prometheus
            action: keep
          - source_labels: [__meta_kubernetes_pod_label_component]
            regex: alertmanager
            action: keep
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_probe]
            regex: .*
            action: keep
          - source_labels: [__meta_kubernetes_pod_container_port_number]
            regex: "9093"
            action: keep
    • ConfigMap形式
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: prometheus-server
        namespace: default
        labels:
          app: prometheus
          app.kubernetes.io/managed-by: Helm
          chart: prometheus-14.6.0
          component: server
          heritage: Helm
          release: prometheus
      data:
        alerting_rules.yml: |
          {}
        alerts: |
          {}
        prometheus.yml: |
          global:
            evaluation_interval: 15s
            scrape_interval: 15s
            scrape_timeout: 10s
          rule_files:
          - /etc/config/recording_rules.yml
          - /etc/config/alerting_rules.yml
          - /etc/config/rules
          - /etc/config/alerts
          scrape_configs:
          - job_name: prometheus
            static_configs:
            - targets:
              - localhost:9090
          - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
          ##############  Kubernetes apiserver 监控配置 ##############
            job_name: kubernetes-apiservers
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: default;kubernetes;https
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
          - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
          ##############  Kubernetes node 性能指标监控配置 ##############
            job_name: kubernetes-nodes
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$1/proxy/metrics
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
          - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      
          ##############  Kubernetes 节点Pod性能指标监控配置 ##############
            job_name: kubernetes-nodes-cadvisor
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
      
          ##############  Kubernetes service endpoints 监控配置 ##############
          - job_name: kubernetes-service-endpoints
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::d+)?;(d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
      
          ##############  Kubernetes service endpoints RabbitMQ 监控配置 ##############
          - job_name: kubernetes-service-rabbitmq
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: rabbitmq
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: keep
              regex: (15692|9419)
              source_labels:
              - __meta_kubernetes_pod_container_port_number
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::d+)?;(d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
      
          ##############  暂时没有理解这个 slow鬼 监控配置 ##############
          - job_name: kubernetes-service-endpoints-slow
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::d+)?;(d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
            scrape_interval: 5m
            scrape_timeout: 30s
          - honor_labels: true
      
          ##############  Prometheus pushgateway 监控配置 ##############
            job_name: prometheus-pushgateway
            kubernetes_sd_configs:
            - role: service
            relabel_configs:
            - action: keep
              regex: pushgateway
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe
      
          ##############  Kubernetes service http_2xx 监控配置 ##############
          - job_name: kubernetes-services
            kubernetes_sd_configs:
            - role: service
            metrics_path: /probe
            params:
              module:
              - http_2xx
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe
            - source_labels:
              - __address__
              target_label: __param_target
            - replacement: blackbox
              target_label: __address__
            - source_labels:
              - __param_target
              target_label: instance
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
      
          ##############  Kubernetes 用户应用 Pod 自定义监控配置 ##############
          - job_name: kubernetes-pods
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::d+)?;(d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed
              source_labels:
              - __meta_kubernetes_pod_phase
      
          ##############  Kubernetes 用户应用 Pod JVM 监控配置 ##############
          - job_name: kubernetes-pods-jvm
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_jvm_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::d+)?;(d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_jvm_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels: 
              - __meta_kubernetes_pod_container_name
              target_label: kubernetes_container_name
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed
              source_labels:
              - __meta_kubernetes_pod_phase
      
          ##############  暂时没有理解这个 slow鬼 监控配置 ##############
          - job_name: kubernetes-pods-slow
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::d+)?;(d+)
              replacement: $1:$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed
              source_labels:
              - __meta_kubernetes_pod_phase
            scrape_interval: 5m
            scrape_timeout: 30s
          alerting:
            alertmanagers:
            - kubernetes_sd_configs:
                - role: pod
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
              relabel_configs:
              - source_labels: [__meta_kubernetes_namespace]
                regex: default
                action: keep
              - source_labels: [__meta_kubernetes_pod_label_app]
                regex: prometheus
                action: keep
              - source_labels: [__meta_kubernetes_pod_label_component]
                regex: alertmanager
                action: keep
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_probe]
                regex: .*
                action: keep
              - source_labels: [__meta_kubernetes_pod_container_port_number]
                regex: "9093"
                action: keep
        recording_rules.yml: |
          {}
        rules: |
          {}
  • 相关阅读:
    Django 标签过滤器
    Python短路原则
    python学习之路 八 :面向对象编程基础
    python学习之路 七 :生成器、迭代器
    python学习之路 六 :装饰器
    python学习之路 五:函数式编程
    python学习之路 四 :文件处理
    python学习之路 三:字符编码
    机器学习流程管理
    pyspark 自定义聚合函数 UDAF
  • 原文地址:https://www.cnblogs.com/apink/p/15218835.html
Copyright © 2011-2022 走看看