简介
- 多维度数据模型(时间序列,KEY/VALUE)
- 灵活的查询语言
- 不依赖其它分布式存储 ,可自主独立存储
- 时间序列收集通过HTTP PULL MODEL
- 通过服务发现&配置文件来发现客户端
- 支持多种模型的dashboard
采集方式
由于数据采集可能会有丢失,所以 Prometheus 不适用对采集数据要 100% 准确的情形。但如果用于记录时间序列数据,Prometheus 具有很大的查询优势,此外,Prometheus 适用于微服务的体系架构
pull方式
Prometheus采集数据是用的pull也就是拉模型,通过HTTP协议去采集指标,只要应用系统能够提供HTTP接口就可以接入监控系统,相比于私有协议或二进制协议来说开发、简单。
push方式
对于定时任务这种短周期的指标采集,如果采用pull模式,可能造成任务结束了,Prometheus还没有来得及采集,这个时候可以使用加一个中转层,客户端推数据到Push Gateway缓存一下,由Prometheus从push gateway pull指标过来。(需要额外搭建Push Gateway,同时需要新增job去从gateway采数据)
组件架构
Prometheus组件
- Prometheus Server 数据库,数据处理,数据存储,数据查询,报警规则等数据图表展示功能
- alertmanager 报警接收及发送管理,提供报警定义模板配置,报警发送,报警路由定义
- Pushgateway 数据采集中转站,数据缓存管理
- exporters 数据采集
- Client library
架构图
部署方式
部署方式helm
引用文档,如下
官方,如何配置服务发现来监控kubernetes https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
Kubernetes service discoveries暴露以下role以供Prometheus采集(Prometheus从kubernetes的REST API接口来scrape以下目标资源信息,并同时保持状态的同步)
- node
- endpoint
- service
- pod
- ingress
安装方式
- 在集群外部安装Prometheus,需要在prometheus.yml配置集群的ca证书
- 在集群内部安装Prometheus,需要创建rbac鉴权策略,具体安装可参考helm官方安装,本指南主要使用helm安装
helm安装共有二个链接,一个官方helm目前已经弃用,其来源指向也是artifacthub,一个是artifacthub专业kubernetes安装集成部署CNCF projects
prometheus artifacthub.io 安装地址
https://artifacthub.io/packages/helm/prometheus-community/prometheus
链接已经详细的指出如何使用helm安装Prometheus,及预安装的必须条件
为什么必须依赖于kube-state-metrics
因为kube-state-metrics主要用来监控kubernetes的副本集的活动状态
- 添加repo,如下
<root@PROD-K8S-CP1 ~># helm repo add prometheus-community https://prometheus-community.github.io/helm-charts "prometheus-community" has been added to your repositories <root@PROD-K8S-CP1 ~># helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics "kube-state-metrics" has been added to your repositories <root@PROD-K8S-CP1 ~># helm repo update Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "kube-state-metrics" chart repository ...Successfully got an update from the "cilium" chart repository ...Successfully got an update from the "prometheus-community" chart repository Update Complete. ⎈ Happy Helming!⎈
- 安装Prometheus
<root@PROD-K8S-CP1 ~># helm install prometheus prometheus-community/prometheus --version 14.1.0 NAME: prometheus LAST DEPLOYED: Thu Sep 2 15:57:19 2021 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster: prometheus-server.default.svc.cluster.local Get the Prometheus server URL by running these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace default port-forward $POD_NAME 9090 The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster: prometheus-alertmanager.default.svc.cluster.local Get the Alertmanager URL by running these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace default port-forward $POD_NAME 9093 ################################################################################# ###### WARNING: Pod Security Policy has been moved to a global property. ##### ###### use .Values.podSecurityPolicy.enabled with pod-based ##### ###### annotations ##### ###### (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) ##### ################################################################################# The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster: prometheus-pushgateway.default.svc.cluster.local Get the PushGateway URL by running these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace default port-forward $POD_NAME 9091 For more information on running Prometheus, visit: https://prometheus.io/
- 查看部署状态
<root@PROD-K8S-CP1 ~># kubectl get pods --all-namespaces -o wide| grep node default prometheus-node-exporter-9rm22 1/1 Running 0 51m 10.1.17.237 prod-be-k8s-wn7 <none> <none> default prometheus-node-exporter-qfxgz 1/1 Running 0 51m 10.1.17.238 prod-be-k8s-wn8 <none> <none> default prometheus-node-exporter-z5znx 1/1 Running 0 51m 10.1.17.236 prod-be-k8s-wn6 <none> <none> <root@PROD-K8S-CP1 ~># kubectl get pods --all-namespaces -o wide| grep prometheus default prometheus-alertmanager-755d84cf4f-rfz4n 0/2 Pending 0 51m <none> <none> <none> <none> default prometheus-kube-state-metrics-86dc6bb59f-wlpcl 1/1 Running 0 51m 172.21.12.167 prod-be-k8s-wn8 <none> <none> default prometheus-node-exporter-9rm22 1/1 Running 0 51m 10.1.17.237 prod-be-k8s-wn7 <none> <none> default prometheus-node-exporter-qfxgz 1/1 Running 0 51m 10.1.17.238 prod-be-k8s-wn8 <none> <none> default prometheus-node-exporter-z5znx 1/1 Running 0 51m 10.1.17.236 prod-be-k8s-wn6 <none> <none> default prometheus-pushgateway-745d67dd5f-7ckvv 1/1 Running 0 51m 172.21.12.2 prod-be-k8s-wn7 <none> <none> default prometheus-server-867f854484-lcrq6
# 默认部署,启用了持久化存储,由于默认的持久化存储没有指定详细的PVC,所以在安装完需要调整持久化配置段,当然也可以选择不需要持久化存储,可以选择将本地卷映射入Pod中 - 自定义安装
helm install prometheus prometheus-community/prometheus -f prometheus-values.yaml
<root@PROD-K8S-CP1 ~># helm show values prometheus-community/prometheus --version 14.1.0 > prometheus-values.yaml 修订版 rbac: create: true podSecurityPolicy: enabled: false imagePullSecrets: # - name: "image-pull-secret" ## Define serviceAccount names for components. Defaults to component's fully qualified name. ## serviceAccounts: alertmanager: create: true name: annotations: {} nodeExporter: create: true name: annotations: {} pushgateway: create: true name: annotations: {} server: create: true name: annotations: {} alertmanager: ## If false, alertmanager will not be installed ## enabled: true ## Use a ClusterRole (and ClusterRoleBinding) ## - If set to false - we define a Role and RoleBinding in the defined namespaces ONLY ## This makes alertmanager work - for users who do not have ClusterAdmin privs, but wants alertmanager to operate on their own namespaces, instead of clusterwide. useClusterRole: true ## Set to a rolename to use existing role - skipping role creating - but still doing serviceaccount and rolebinding to the rolename set here. useExistingRole: false ## alertmanager container name ## name: alertmanager ## alertmanager container image ## image: repository: quay.io/prometheus/alertmanager tag: v0.21.0 pullPolicy: IfNotPresent ## alertmanager priorityClassName ## priorityClassName: "" ## Additional alertmanager container arguments ## extraArgs: {} ## Additional InitContainers to initialize the pod ## extraInitContainers: [] ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug ## so that the various internal URLs are still able to access as they are in the default case. ## (Optional) prefixURL: "" ## External URL which can access alertmanager baseURL: "http://localhost:9093" ## Additional alertmanager container environment variable ## For instance to add a http_proxy ## extraEnv: {} ## Additional alertmanager Secret mounts # Defines additional mounts with secrets. Secrets must be manually created in the namespace. extraSecretMounts: [] # - name: secret-files # mountPath: /etc/secrets # subPath: "" # secretName: alertmanager-secret-files # readOnly: true ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.alertmanager.configMapOverrideName}} ## Defining configMapOverrideName will cause templates/alertmanager-configmap.yaml ## to NOT generate a ConfigMap resource ## configMapOverrideName: "" ## The name of a secret in the same kubernetes namespace which contains the Alertmanager config ## Defining configFromSecret will cause templates/alertmanager-configmap.yaml ## to NOT generate a ConfigMap resource ## configFromSecret: "" ## The configuration file name to be loaded to alertmanager ## Must match the key within configuration loaded from ConfigMap/Secret ## configFileName: alertmanager.yml ingress: ## If true, alertmanager Ingress will be created ## enabled: false ## alertmanager Ingress annotations ## annotations: {} # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: 'true' ## alertmanager Ingress additional labels ## extraLabels: {} ## alertmanager Ingress hostnames with optional path ## Must be provided if Ingress is enabled ## hosts: [] # - alertmanager.domain.com # - domain.com/alertmanager ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services. extraPaths: [] # - path: /* # backend: # serviceName: ssl-redirect # servicePort: use-annotation ## alertmanager Ingress TLS configuration ## Secrets must be manually created in the namespace ## tls: [] # - secretName: prometheus-alerts-tls # hosts: # - alertmanager.domain.com ## Alertmanager Deployment Strategy type # strategy: # type: Recreate ## Node tolerations for alertmanager scheduling to nodes with taints ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ ## ## 配置alertmanager污点容忍 tolerations: - key: resource # operator: "Equal|Exists" value: base # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)" effect: NoExecute ## Node labels for alertmanager pod assignment ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ ## nodeSelector: kubernetes.io/hostname: prod-sys-k8s-wn3 ## Pod affinity ## ## 配置alertmanager节点亲和性 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/resource operator: In values: - base ## PodDisruptionBudget settings ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ ## podDisruptionBudget: enabled: false maxUnavailable: 1 ## Use an alternate scheduler, e.g. "stork". ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ ## # schedulerName: persistentVolume: ## If true, alertmanager will create/use a Persistent Volume Claim ## If false, use emptyDir ## enabled: true ## alertmanager data Persistent Volume access modes ## Must match those of existing PV or dynamic provisioner ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ ## accessModes: - ReadWriteOnce ## alertmanager data Persistent Volume Claim annotations ## annotations: {} ## alertmanager data Persistent Volume existing claim name ## Requires alertmanager.persistentVolume.enabled: true ## If defined, PVC must be created manually before volume will be bound existingClaim: "" ## alertmanager data Persistent Volume mount root path ## mountPath: /data ## alertmanager data Persistent Volume size ## size: 20Gi ## alertmanager data Persistent Volume Storage Class ## If defined, storageClassName: <storageClass> ## If set to "-", storageClassName: "", which disables dynamic provisioning ## If undefined (the default) or set to null, no storageClassName spec is ## set, choosing the default provisioner. (gp2 on AWS, standard on ## GKE, AWS & OpenStack) ## # storageClass: "-" ## 配置alertmanager持久化存储 storageClass: "alicloud-disk-essd" ## alertmanager data Persistent Volume Binding Mode ## If defined, volumeBindingMode: <volumeBindingMode> ## If undefined (the default) or set to null, no volumeBindingMode spec is ## set, choosing the default mode. ## # volumeBindingMode: "" ## Subdirectory of alertmanager data Persistent Volume to mount ## Useful if the volume's root directory is not empty ## subPath: "" emptyDir: ## alertmanager emptyDir volume size limit ## sizeLimit: "" ## Annotations to be added to alertmanager pods ## podAnnotations: {} ## Tell prometheus to use a specific set of alertmanager pods ## instead of all alertmanager pods found in the same namespace ## Useful if you deploy multiple releases within the same namespace ## ## prometheus.io/probe: alertmanager-teamA ## Labels to be added to Prometheus AlertManager pods ## podLabels: {} ## Specify if a Pod Security Policy for node-exporter must be created ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ ## podSecurityPolicy: annotations: {} ## Specify pod annotations ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl ## # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*' # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default' # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default' ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below) ## replicaCount: 1 ## Annotations to be added to deployment ## deploymentAnnotations: {} statefulSet: ## If true, use a statefulset instead of a deployment for pod management. ## This allows to scale replicas to more than 1 pod ## enabled: false annotations: {} labels: {} podManagementPolicy: OrderedReady ## Alertmanager headless service to use for the statefulset ## headless: annotations: {} labels: {} ## Enabling peer mesh service end points for enabling the HA alert manager ## Ref: https://github.com/prometheus/alertmanager/blob/master/README.md enableMeshPeer: false servicePort: 80 ## alertmanager resource requests and limits ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## ## 配置资源限额 resources: limits: cpu: 2 memory: 2Gi requests: cpu: 10m memory: 32Mi # Custom DNS configuration to be added to alertmanager pods dnsConfig: {} # nameservers: # - 1.2.3.4 # searches: # - ns1.svc.cluster-domain.example # - my.dns.search.suffix # options: # - name: ndots # value: "2" # - name: edns0 ## 配置网络模式 hostNetwork: false ## Security context to be added to alertmanager pods ## securityContext: runAsUser: 65534 runAsNonRoot: true runAsGroup: 65534 fsGroup: 65534 service: annotations: {} labels: {} clusterIP: "" ## Enabling peer mesh service end points for enabling the HA alert manager ## Ref: https://github.com/prometheus/alertmanager/blob/master/README.md # enableMeshPeer : true ## List of IP addresses at which the alertmanager service is available ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips ## ## 配置service外部地址 externalIPs: - 10.1.0.11 loadBalancerIP: "" loadBalancerSourceRanges: [] servicePort: 9093 # nodePort: 30000 sessionAffinity: None type: ClusterIP ## Monitors ConfigMap changes and POSTs to a URL ## Ref: https://github.com/jimmidyson/configmap-reload ## configmapReload: prometheus: ## If false, the configmap-reload container will not be deployed ## enabled: true ## configmap-reload container name ## name: configmap-reload ## configmap-reload container image ## image: repository: jimmidyson/configmap-reload tag: v0.5.0 pullPolicy: IfNotPresent ## Additional configmap-reload container arguments ## extraArgs: {} ## Additional configmap-reload volume directories ## extraVolumeDirs: [] ## Additional configmap-reload mounts ## extraConfigmapMounts: [] # - name: prometheus-alerts # mountPath: /etc/alerts.d # subPath: "" # configMap: prometheus-alerts # readOnly: true ## configmap-reload resource requests and limits ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## resources: {} alertmanager: ## If false, the configmap-reload container will not be deployed ## enabled: true ## configmap-reload container name ## name: configmap-reload ## configmap-reload container image ## image: repository: jimmidyson/configmap-reload tag: v0.5.0 pullPolicy: IfNotPresent ## Additional configmap-reload container arguments ## extraArgs: {} ## Additional configmap-reload volume directories ## extraVolumeDirs: [] ## Additional configmap-reload mounts ## extraConfigmapMounts: [] # - name: prometheus-alerts # mountPath: /etc/alerts.d # subPath: "" # configMap: prometheus-alerts # readOnly: true ## configmap-reload resource requests and limits ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## resources: {} kubeStateMetrics: ## If false, kube-state-metrics sub-chart will not be installed ## enabled: true ## 配置节点亲和性 nodeSelector: kubernetes.io/hostname: prod-sys-k8s-wn3 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/resource operator: In values: - base priorityClassName: "monitor-service" resources: limits: cpu: 1 memory: 128Mi requests: cpu: 100m memory: 30Mi tolerations: - key: resource # operator: "Equal|Exists" value: base effect: NoExecute ## kube-state-metrics sub-chart configurable values ## Please see https://github.com/kubernetes/kube-state-metrics/tree/master/charts/kube-state-metrics ## # kube-state-metrics: nodeExporter: ## If false, node-exporter will not be installed ## enabled: true ## If true, node-exporter pods share the host network namespace ## hostNetwork: true ## If true, node-exporter pods share the host PID namespace ## hostPID: true ## If true, node-exporter pods mounts host / at /host/root ## hostRootfs: true ## node-exporter container name ## name: node-exporter ## node-exporter container image ## image: repository: quay.io/prometheus/node-exporter tag: v1.1.2 pullPolicy: IfNotPresent ## 优先级配置 priorityClassName: "monitor-service" ## Specify if a Pod Security Policy for node-exporter must be created ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ ## podSecurityPolicy: annotations: {} ## Specify pod annotations ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl ## # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*' # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default' # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default' ## node-exporter priorityClassName ## 配置node-exporter优先级 priorityClassName: "monitor-service" ## Custom Update Strategy ## 配置滚动更新策略 updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 ## Additional node-exporter container arguments ## extraArgs: {} ## Additional InitContainers to initialize the pod ## extraInitContainers: [] ## Additional node-exporter hostPath mounts ## extraHostPathMounts: [] # - name: textfile-dir # mountPath: /srv/txt_collector # hostPath: /var/lib/node-exporter # readOnly: true # mountPropagation: HostToContainer extraConfigmapMounts: [] # - name: certs-configmap # mountPath: /prometheus # configMap: certs-configmap # readOnly: true ## Node tolerations for node-exporter scheduling to nodes with taints ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ ## ## 配置node-exporter污点容忍 tolerations: # - key: "key" # operator: "Equal|Exists" - operator: Exists # value: "value" # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)" ## Node labels for node-exporter pod assignment ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ ## nodeSelector: {} ## Annotations to be added to node-exporter pods ## podAnnotations: {} ## Labels to be added to node-exporter pods ## pod: labels: {} ## PodDisruptionBudget settings ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ ## podDisruptionBudget: enabled: false maxUnavailable: 1 ## node-exporter resource limits & requests ## Ref: https://kubernetes.io/docs/user-guide/compute-resources/ ## ## 配置node-exporter资源限额 resources: limits: cpu: 1 memory: 128Mi requests: cpu: 100m memory: 30Mi # Custom DNS configuration to be added to node-exporter pods dnsConfig: {} # nameservers: # - 1.2.3.4 # searches: # - ns1.svc.cluster-domain.example # - my.dns.search.suffix # options: # - name: ndots # value: "2" # - name: edns0 ## Security context to be added to node-exporter pods ## securityContext: fsGroup: 65534 runAsGroup: 65534 runAsNonRoot: true runAsUser: 65534 service: annotations: prometheus.io/scrape: "true" labels: {} # Exposed as a headless service: # https://kubernetes.io/docs/concepts/services-networking/service/#headless-services clusterIP: None ## List of IP addresses at which the node-exporter service is available ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips ## externalIPs: [] hostPort: 9100 loadBalancerIP: "" loadBalancerSourceRanges: [] servicePort: 9100 type: ClusterIP server: ## Prometheus server container name ## enabled: true ## Use a ClusterRole (and ClusterRoleBinding) ## - If set to false - we define a RoleBinding in the defined namespaces ONLY ## ## NB: because we need a Role with nonResourceURL's ("/metrics") - you must get someone with Cluster-admin privileges to define this role for you, before running with this setting enabled. ## This makes prometheus work - for users who do not have ClusterAdmin privs, but wants prometheus to operate on their own namespaces, instead of clusterwide. ## ## You MUST also set namespaces to the ones you have access to and want monitored by Prometheus. ## # useExistingClusterRoleName: nameofclusterrole ## namespaces to monitor (instead of monitoring all - clusterwide). Needed if you want to run without Cluster-admin privileges. # namespaces: # - yournamespace ## 配置prometheus容器名称 name: server # sidecarContainers - add more containers to prometheus server # Key/Value where Key is the sidecar `- name: <Key>` # Example: # sidecarContainers: # webserver: # image: nginx sidecarContainers: {} ## Prometheus server container image ## image: repository: quay.io/prometheus/prometheus tag: v2.26.0 pullPolicy: IfNotPresent ## prometheus server priorityClassName ## priorityClassName: "monitor-service" ## EnableServiceLinks indicates whether information about services should be injected ## into pod's environment variables, matching the syntax of Docker links. ## WARNING: the field is unsupported and will be skipped in K8s prior to v1.13.0. ## enableServiceLinks: true ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug ## so that the various internal URLs are still able to access as they are in the default case. ## (Optional) prefixURL: "" ## External URL which can access prometheus ## Maybe same with Ingress host name baseURL: "" ## Additional server container environment variables ## ## You specify this manually like you would a raw deployment manifest. ## This means you can bind in environment variables from secrets. ## ## e.g. static environment variable: ## - name: DEMO_GREETING ## value: "Hello from the environment" ## ## e.g. secret environment variable: ## - name: USERNAME ## valueFrom: ## secretKeyRef: ## name: mysecret ## key: username env: [] extraFlags: - web.enable-lifecycle ## web.enable-admin-api flag controls access to the administrative HTTP API which includes functionality such as ## deleting time series. This is disabled by default. - web.enable-admin-api ## ## storage.tsdb.no-lockfile flag controls BD locking # - storage.tsdb.no-lockfile ## ## storage.tsdb.wal-compression flag enables compression of the write-ahead log (WAL) # - storage.tsdb.wal-compression ## Path to a configuration file on prometheus server container FS configPath: /etc/config/prometheus.yml ### The data directory used by prometheus to set --storage.tsdb.path ### When empty server.persistentVolume.mountPath is used instead ## 配置持久化存储SC storagePath: "" ## Prometheus配置文件自定义 global: ## How frequently to scrape targets by default ## scrape_interval: 15s ## How long until a scrape request times out ## scrape_timeout: 10s ## How frequently to evaluate rules ## evaluation_interval: 15s ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write ## remoteWrite: [] ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read ## remoteRead: [] ## Additional Prometheus server container arguments ## extraArgs: {} ## Additional InitContainers to initialize the pod ## extraInitContainers: [] ## Additional Prometheus server Volume mounts ## extraVolumeMounts: [] ## Additional Prometheus server Volumes ## extraVolumes: [] ## Additional Prometheus server hostPath mounts ## extraHostPathMounts: [] # - name: certs-dir # mountPath: /etc/kubernetes/certs # subPath: "" # hostPath: /etc/kubernetes/certs # readOnly: true extraConfigmapMounts: [] # - name: certs-configmap # mountPath: /prometheus # subPath: "" # configMap: certs-configmap # readOnly: true ## Additional Prometheus server Secret mounts # Defines additional mounts with secrets. Secrets must be manually created in the namespace. extraSecretMounts: [] # - name: secret-files # mountPath: /etc/secrets # subPath: "" # secretName: prom-secret-files # readOnly: true ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.server.configMapOverrideName}} ## Defining configMapOverrideName will cause templates/server-configmap.yaml ## to NOT generate a ConfigMap resource ## configMapOverrideName: "" ingress: ## If true, Prometheus server Ingress will be created ## enabled: false ## Prometheus server Ingress annotations ## annotations: {} # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: 'true' ## Prometheus server Ingress additional labels ## extraLabels: {} ## Prometheus server Ingress hostnames with optional path ## Must be provided if Ingress is enabled ## hosts: [] # - prometheus.domain.com # - domain.com/prometheus ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services. extraPaths: [] # - path: /* # backend: # serviceName: ssl-redirect # servicePort: use-annotation ## Prometheus server Ingress TLS configuration ## Secrets must be manually created in the namespace ## tls: [] # - secretName: prometheus-server-tls # hosts: # - prometheus.domain.com ## Server Deployment Strategy type # strategy: # type: Recreate ## hostAliases allows adding entries to /etc/hosts inside the containers hostAliases: [] # - ip: "127.0.0.1" # hostnames: # - "example.com" ## Node tolerations for server scheduling to nodes with taints ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ ## ## 配置prometheus-server污点容忍 tolerations: - key: resource # operator: "Equal|Exists" value: base # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)" effect: NoExecute ## Node labels for Prometheus server pod assignment ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ ## nodeSelector: kubernetes.io/hostname: prod-sys-k8s-wn4 ## Pod affinity ## affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/resource operator: In values: - base ## PodDisruptionBudget settings ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ ## podDisruptionBudget: enabled: false maxUnavailable: 1 ## Use an alternate scheduler, e.g. "stork". ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ ## # schedulerName: persistentVolume: ## If true, Prometheus server will create/use a Persistent Volume Claim ## If false, use emptyDir ## enabled: true ## Prometheus server data Persistent Volume access modes ## Must match those of existing PV or dynamic provisioner ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ ## accessModes: - ReadWriteOnce ## Prometheus server data Persistent Volume annotations ## annotations: {} ## Prometheus server data Persistent Volume existing claim name ## Requires server.persistentVolume.enabled: true ## If defined, PVC must be created manually before volume will be bound existingClaim: "" ## Prometheus server data Persistent Volume mount root path ## mountPath: /data ## Prometheus server data Persistent Volume size ## size: 500Gi ## Prometheus server data Persistent Volume Storage Class ## If defined, storageClassName: <storageClass> ## If set to "-", storageClassName: "", which disables dynamic provisioning ## If undefined (the default) or set to null, no storageClassName spec is ## set, choosing the default provisioner. (gp2 on AWS, standard on ## GKE, AWS & OpenStack) ## storageClass: "alicloud-disk-essd" ## Prometheus server data Persistent Volume Binding Mode ## If defined, volumeBindingMode: <volumeBindingMode> ## If undefined (the default) or set to null, no volumeBindingMode spec is ## set, choosing the default mode. ## # volumeBindingMode: "" ## Subdirectory of Prometheus server data Persistent Volume to mount ## Useful if the volume's root directory is not empty ## subPath: "" emptyDir: ## Prometheus server emptyDir volume size limit ## sizeLimit: "" ## Annotations to be added to Prometheus server pods ## podAnnotations: {} # iam.amazonaws.com/role: prometheus ## Labels to be added to Prometheus server pods ## podLabels: {} ## Prometheus AlertManager configuration ## alertmanagers: [] ## Specify if a Pod Security Policy for node-exporter must be created ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ ## podSecurityPolicy: annotations: {} ## Specify pod annotations ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl ## # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*' # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default' # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default' ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below) ## replicaCount: 1 ## Annotations to be added to deployment ## deploymentAnnotations: {} statefulSet: ## If true, use a statefulset instead of a deployment for pod management. ## This allows to scale replicas to more than 1 pod ## enabled: false annotations: {} labels: {} podManagementPolicy: OrderedReady ## Alertmanager headless service to use for the statefulset ## headless: annotations: {} labels: {} servicePort: 80 ## Enable gRPC port on service to allow auto discovery with thanos-querier gRPC: enabled: false servicePort: 10901 # nodePort: 10901 ## Prometheus server readiness and liveness probe initial delay and timeout ## Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ ## readinessProbeInitialDelay: 30 readinessProbePeriodSeconds: 5 readinessProbeTimeout: 4 readinessProbeFailureThreshold: 3 readinessProbeSuccessThreshold: 1 livenessProbeInitialDelay: 30 livenessProbePeriodSeconds: 15 livenessProbeTimeout: 10 livenessProbeFailureThreshold: 3 livenessProbeSuccessThreshold: 1 ## Prometheus server resource requests and limits ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## ## prometheus-server资源限额 resources: limits: cpu: 4 memory: 30Gi requests: cpu: 500m memory: 5Gi # Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico), # because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working ## hostNetwork: false # When hostNetwork is enabled, you probably want to set this to ClusterFirstWithHostNet dnsPolicy: ClusterFirst ## Vertical Pod Autoscaler config ## Ref: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler verticalAutoscaler: ## If true a VPA object will be created for the controller (either StatefulSet or Deployemnt, based on above configs) enabled: false # updateMode: "Auto" # containerPolicies: # - containerName: 'prometheus-server' # Custom DNS configuration to be added to prometheus server pods dnsConfig: {} # nameservers: # - 1.2.3.4 # searches: # - ns1.svc.cluster-domain.example # - my.dns.search.suffix # options: # - name: ndots # value: "2" # - name: edns0 ## Security context to be added to server pods ## securityContext: runAsUser: 65534 runAsNonRoot: true runAsGroup: 65534 fsGroup: 65534 service: annotations: {} labels: {} clusterIP: "" ## List of IP addresses at which the Prometheus server service is available ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips ## ## 配置prometheus-server-service外部IP访问 externalIPs: - 10.1.0.10 loadBalancerIP: "" loadBalancerSourceRanges: [] servicePort: 9090 sessionAffinity: None type: ClusterIP ## Enable gRPC port on service to allow auto discovery with thanos-querier gRPC: enabled: false servicePort: 10901 # nodePort: 10901 ## If using a statefulSet (statefulSet.enabled=true), configure the ## service to connect to a specific replica to have a consistent view ## of the data. statefulsetReplica: enabled: false replica: 0 ## Prometheus server pod termination grace period ## terminationGracePeriodSeconds: 300 ## Prometheus data retention period (default if not specified is 15 days) ## retention: "30d" pushgateway: ## If false, pushgateway will not be installed ## enabled: true ## Use an alternate scheduler, e.g. "stork". ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ ## # schedulerName: ## pushgateway container name ## name: pushgateway ## pushgateway container image ## image: repository: prom/pushgateway tag: v1.3.1 pullPolicy: IfNotPresent ## pushgateway priorityClassName ## priorityClassName: "monitor-service" ## Additional pushgateway container arguments ## ## for example: persistence.file: /data/pushgateway.data extraArgs: {} ## Additional InitContainers to initialize the pod ## extraInitContainers: [] ingress: ## If true, pushgateway Ingress will be created ## enabled: false ## pushgateway Ingress annotations ## annotations: {} # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: 'true' ## pushgateway Ingress hostnames with optional path ## Must be provided if Ingress is enabled ## hosts: [] # - pushgateway.domain.com # - domain.com/pushgateway ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services. extraPaths: [] # - path: /* # backend: # serviceName: ssl-redirect # servicePort: use-annotation ## pushgateway Ingress TLS configuration ## Secrets must be manually created in the namespace ## tls: [] # - secretName: prometheus-alerts-tls # hosts: # - pushgateway.domain.com ## Node tolerations for pushgateway scheduling to nodes with taints ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ ## ## 配置PushGateway污点容忍 tolerations: - key: resource # operator: "Equal|Exists" value: base # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)" effect: NoExecute ## Node labels for pushgateway pod assignment ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ ## nodeSelector: {} ## 配置节点亲和性 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/resource operator: In values: - base ## Annotations to be added to pushgateway pods ## podAnnotations: {} ## Labels to be added to pushgateway pods ## podLabels: {} ## Specify if a Pod Security Policy for node-exporter must be created ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ ## podSecurityPolicy: annotations: {} ## Specify pod annotations ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl ## # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*' # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default' # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default' replicaCount: 1 ## Annotations to be added to deployment ## deploymentAnnotations: {} ## PodDisruptionBudget settings ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ ## podDisruptionBudget: enabled: false maxUnavailable: 1 ## pushgateway resource requests and limits ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## ## 配置PushGateway资源限额 resources: limits: cpu: 10m memory: 32Mi requests: cpu: 10m memory: 32Mi # Custom DNS configuration to be added to push-gateway pods dnsConfig: {} # nameservers: # - 1.2.3.4 # searches: # - ns1.svc.cluster-domain.example # - my.dns.search.suffix # options: # - name: ndots # value: "2" # - name: edns0 ## Security context to be added to push-gateway pods ## securityContext: runAsUser: 65534 runAsNonRoot: true service: annotations: prometheus.io/probe: pushgateway labels: {} clusterIP: "" ## List of IP addresses at which the pushgateway service is available ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips ## externalIPs: [] loadBalancerIP: "" loadBalancerSourceRanges: [] servicePort: 9091 type: ClusterIP ## pushgateway Deployment Strategy type # strategy: # type: Recreate persistentVolume: ## If true, pushgateway will create/use a Persistent Volume Claim ## enabled: false ## pushgateway data Persistent Volume access modes ## Must match those of existing PV or dynamic provisioner ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ ## accessModes: - ReadWriteOnce ## pushgateway data Persistent Volume Claim annotations ## annotations: {} ## pushgateway data Persistent Volume existing claim name ## Requires pushgateway.persistentVolume.enabled: true ## If defined, PVC must be created manually before volume will be bound existingClaim: "" ## pushgateway data Persistent Volume mount root path ## mountPath: /data ## pushgateway data Persistent Volume size ## size: 2Gi ## pushgateway data Persistent Volume Storage Class ## If defined, storageClassName: <storageClass> ## If set to "-", storageClassName: "", which disables dynamic provisioning ## If undefined (the default) or set to null, no storageClassName spec is ## set, choosing the default provisioner. (gp2 on AWS, standard on ## GKE, AWS & OpenStack) ## # storageClass: "-" ## pushgateway data Persistent Volume Binding Mode ## If defined, volumeBindingMode: <volumeBindingMode> ## If undefined (the default) or set to null, no volumeBindingMode spec is ## set, choosing the default mode. ## # volumeBindingMode: "" ## Subdirectory of pushgateway data Persistent Volume to mount ## Useful if the volume's root directory is not empty ## subPath: "" ## alertmanager ConfigMap entries ## alertmanagerFiles: alertmanager.yml: global: {} # slack_api_url: '' receivers: - name: default-receiver # slack_configs: # - channel: '@you' # send_resolved: true route: group_wait: 10s group_interval: 5m receiver: default-receiver repeat_interval: 1h ## Prometheus server ConfigMap entries ## serverFiles: ## Alerts configuration ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ alerting_rules.yml: {} # groups: # - name: Instances # rules: # - alert: InstanceDown # expr: up == 0 # for: 5m # labels: # severity: page # annotations: # description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' # summary: 'Instance {{ $labels.instance }} down' ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use alerting_rules.yml alerts: {} ## Records configuration ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/ recording_rules.yml: {} ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use recording_rules.yml rules: {} prometheus.yml: rule_files: - /etc/config/recording_rules.yml - /etc/config/alerting_rules.yml ## Below two files are DEPRECATED will be removed from this default values file - /etc/config/rules - /etc/config/alerts scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 # A scrape configuration for running Prometheus on a Kubernetes cluster. # This uses separate scrape configs for cluster components (i.e. API server, node) # and services to allow each to use different authentication configs. # # Kubernetes labels will be added as Prometheus labels on metrics via the # `labelmap` relabeling action. # Scrape config for API servers. # # Kubernetes exposes API servers as endpoints to the default/kubernetes # service so this uses `endpoints` role and uses relabelling to only keep # the endpoints associated with the default/kubernetes service using the # default named port `https`. This works for single API server deployments as # well as HA API server deployments. - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # If your node certificates are self-signed or use a different CA to the # master CA, then disable certificate verification below. Note that # certificate verification is an integral part of a secure infrastructure # so this should only be disabled in a controlled environment. You can # disable certificate verification by uncommenting the line below. # insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # Keep only the default/kubernetes service endpoints for the https port. This # will add targets for each API server which Kubernetes adds an endpoint to # the default/kubernetes service. relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-nodes' # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # If your node certificates are self-signed or use a different CA to the # master CA, then disable certificate verification below. Note that # certificate verification is an integral part of a secure infrastructure # so this should only be disabled in a controlled environment. You can # disable certificate verification by uncommenting the line below. # insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/$1/proxy/metrics - job_name: 'kubernetes-nodes-cadvisor' # Default to scraping over https. If required, just disable this or change to # `http`. scheme: https # This TLS & bearer token file config is used to connect to the actual scrape # endpoints for cluster components. This is separate to discovery auth # configuration because discovery & scraping are two separate concerns in # Prometheus. The discovery auth config is automatic if Prometheus runs inside # the cluster. Otherwise, more config options have to be provided within the # <kubernetes_sd_config>. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # If your node certificates are self-signed or use a different CA to the # master CA, then disable certificate verification below. Note that # certificate verification is an integral part of a secure infrastructure # so this should only be disabled in a controlled environment. You can # disable certificate verification by uncommenting the line below. # insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node # This configuration will work only on kubelet 1.7.3+ # As the scrape endpoints for cAdvisor have changed # if you are using older version you need to change the replacement to # replacement: /api/v1/nodes/$1:4194/proxy/metrics # more info here https://github.com/coreos/prometheus-operator/issues/633 relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor # Scrape config for service endpoints. # # The relabeling allows the actual service scrape endpoint to be configured # via the following annotations: # # * `prometheus.io/scrape`: Only scrape services that have a value of `true` # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need # to set this to `https` & most likely set the `tls_config` of the scrape config. # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. # * `prometheus.io/port`: If the metrics are exposed on a different port to the # service then set this appropriately. - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - source_labels: [__meta_kubernetes_pod_node_name] action: replace target_label: kubernetes_node # Scrape config for slow service endpoints; same as above, but with a larger # timeout and a larger interval # # The relabeling allows the actual service scrape endpoint to be configured # via the following annotations: # # * `prometheus.io/scrape-slow`: Only scrape services that have a value of `true` # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need # to set this to `https` & most likely set the `tls_config` of the scrape config. # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. # * `prometheus.io/port`: If the metrics are exposed on a different port to the # service then set this appropriately. - job_name: 'kubernetes-service-endpoints-slow' scrape_interval: 5m scrape_timeout: 30s kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - source_labels: [__meta_kubernetes_pod_node_name] action: replace target_label: kubernetes_node - job_name: 'prometheus-pushgateway' honor_labels: true kubernetes_sd_configs: - role: service relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: pushgateway # Example scrape config for probing services via the Blackbox Exporter. # # The relabeling allows the actual service scrape endpoint to be configured # via the following annotations: # # * `prometheus.io/probe`: Only probe services that have a value of `true` - job_name: 'kubernetes-services' metrics_path: /probe params: module: [http_2xx] kubernetes_sd_configs: - role: service relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name # Example scrape config for pods # # The relabeling allows the actual pod scrape endpoint to be configured via the # following annotations: # # * `prometheus.io/scrape`: Only scrape pods that have a value of `true` # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need # to set this to `https` & most likely set the `tls_config` of the scrape config. # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`. - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] action: replace regex: (https?) target_label: __scheme__ - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - source_labels: [__meta_kubernetes_pod_phase] regex: Pending|Succeeded|Failed action: drop # Example Scrape config for pods which should be scraped slower. An useful example # would be stackriver-exporter which queries an API on every scrape of the pod # # The relabeling allows the actual pod scrape endpoint to be configured via the # following annotations: # # * `prometheus.io/scrape-slow`: Only scrape pods that have a value of `true` # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need # to set this to `https` & most likely set the `tls_config` of the scrape config. # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`. - job_name: 'kubernetes-pods-slow' scrape_interval: 5m scrape_timeout: 30s kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] action: replace regex: (https?) target_label: __scheme__ - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - source_labels: [__meta_kubernetes_pod_phase] regex: Pending|Succeeded|Failed action: drop # adds additional scrape configs to prometheus.yml # must be a string so you have to add a | after extraScrapeConfigs: # example adds prometheus-blackbox-exporter scrape config extraScrapeConfigs: # - job_name: 'prometheus-blackbox-exporter' # metrics_path: /probe # params: # module: [http_2xx] # static_configs: # - targets: # - https://example.com # relabel_configs: # - source_labels: [__address__] # target_label: __param_target # - source_labels: [__param_target] # target_label: instance # - target_label: __address__ # replacement: prometheus-blackbox-exporter:9115 # Adds option to add alert_relabel_configs to avoid duplicate alerts in alertmanager # useful in H/A prometheus with different external labels but the same alerts alertRelabelConfigs: # alert_relabel_configs: # - source_labels: [dc] # regex: (.+)d+ # target_label: dc networkPolicy: ## Enable creation of NetworkPolicy resources. ## enabled: false # Force namespace of namespaced resources forceNamespace: null
- 访问Prometheus & Alertmanager后台管理界面
<root@PROD-K8S-CP1 ~># kubectl describe svc prometheus-prometheus-server Name: prometheus-prometheus-server Namespace: default Labels: app=prometheus app.kubernetes.io/managed-by=Helm chart=prometheus-14.6.0 component=prometheus-server heritage=Helm release=prometheus Annotations: meta.helm.sh/release-name: prometheus meta.helm.sh/release-namespace: default Selector: app=prometheus,component=prometheus-server,release=prometheus Type: ClusterIP IP: 10.12.0.20 External IPs: 10.1.0.10 Port: http 9090/TCP TargetPort: 9090/TCP Endpoints: 172.21.3.157:9090 Session Affinity: None Events: <none> <root@PROD-K8S-CP1 ~># kubectl describe svc prometheus-alertmanager Name: prometheus-alertmanager Namespace: default Labels: app=prometheus app.kubernetes.io/managed-by=Helm chart=prometheus-14.6.0 component=alertmanager heritage=Helm release=prometheus Annotations: meta.helm.sh/release-name: prometheus meta.helm.sh/release-namespace: default Selector: app=prometheus,component=alertmanager,release=prometheus Type: ClusterIP IP: 10.12.0.13 External IPs: 10.1.0.11 Port: http 9093/TCP TargetPort: 9093/TCP Endpoints: 172.21.3.251:9093 Session Affinity: None Events: <none>
- 访问方式 10.1.0.10:80
- 修改时区 lens直接修改
- name: localtime mountPath: /etc/localtime ---------------------------------------------- - name: localtime hostPath: path: /etc/localtime type: ''
- 注意修改Prometheus-server的滚动更新策略,最大不可用设置为1,否则会出现DB锁定错误
strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1
- 如果手动查看节点expose的metrics
<root@PRE-K8S-CP1 ~># curl -ik https://10.1.0.233:6443/metrics HTTP/1.1 403 Forbidden Cache-Control: no-cache, private Content-Type: application/json X-Content-Type-Options: nosniff Date: Wed, 02 Jun 2021 03:40:55 GMT Content-Length: 240 { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "forbidden: User "system:anonymous" cannot get path "/metrics"", "reason": "Forbidden", "details": { }, "code": 403 } 解决方案如下,是因为默认情况下,Kubernetes不允许未经授权下不能访问集群资源信息 kubectl create clusterrolebinding prometheus-admin --clusterrole cluster-admin --user system:anonymous
Prometheus配置文件
reable_action参数解释
- replace 默认,通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组
- keep 删除regex与连接不匹配的目标 source_labels
- drop 删除regex与连接匹配的目标 source_labels
- labelmap 匹配regex所有标签名称。然后复制匹配标签的值进行分组,replacement分组引用(${1},${2},…)替代
- labeldrop 删除regex匹配的标签
- labelkeep 删除regex不匹配的标签
默认配置文件
job_name一般用来定义收集什么类型的metrics,下面的job_name定义了收集的是
- api-server的性能指标,具体的配置解释如下
- job_name: kubernetes-apiservers honor_timestamps: true scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: https authorization: type: Bearer credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true follow_redirects: true relabel_configs: ## 标签重新定义 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] ##定义原标签的名称作为匹配的条件 separator: ; ## 分隔符的定义,用于分隔源标签值之间的符号 regex: default;kubernetes;https ## 匹配原标签source_lable的值,比如原标签是__meta_kubernetes_service_name="kubernetes" __meta_kubernetes_namespace="default" __meta_kubernetes_endpoint_port_name="https" 匹配规则需要匹配到source_lable为“__meta_kubernetes_endpoint_port_name”的值是"https"这个endpoints,这里的endpoints就是一个监控点,不要与kubernetes的endpoints混淆 replacement: $1 ##默认操作 action: keep ## 删除regex与source_lable的不匹配的标签,并保留匹配的regex kubernetes_sd_configs: - role: endpoints follow_redirects: true
- 节点性能指标采集job_name kubernetes_nodes
- job_name: kubernetes-nodes honor_timestamps: true scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: https authorization: type: Bearer credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true follow_redirects: true relabel_configs: - separator: ; regex: __meta_kubernetes_node_label_(.+) replacement: $1 action: labelmap ##保留__meta_kubernetes_node_label_(.+)中的".+"的值 - separator: ; regex: (.*) target_label: __address__ replacement: kubernetes.default.svc:443 action: replace ## 新增__address标签并且值替换为replacement的值 - source_labels: [__meta_kubernetes_node_name] separator: ; regex: (.+) ##匹配source_lable任意的值(实际就是node节点的名称) target_label: __metrics_path__ replacement: /api/v1/nodes/$1/proxy/metrics action: replace ##替换面的$1就是把regex匹配到的node节点的名称的值 kubernetes_sd_configs: - role: node follow_redirects: true 整段的配置意思就是修改了默认的__address__的值,第一段将IP修改成域名形式,第二配置将默认的/metrics修改成/api/v1/nodes/$1/proxy/metrics,整合起来最终的值就是 https://kubernetes.default.svc/api/v1/nodes/pre-k8s-cp3/proxy/metrics
- 节点容器指标采集说明,job_name kubernetes_nodes_cadvisor
- job_name: kubernetes-nodes-cadvisor honor_timestamps: true scrape_interval: 10s scrape_timeout: 10s metrics_path: /metrics scheme: https authorization: type: Bearer credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true follow_redirects: true relabel_configs: - separator: ; regex: __meta_kubernetes_node_label_(.+) replacement: $1 action: labelmap - separator: ; regex: (.*) target_label: __address__ replacement: kubernetes.default.svc:443 action: replace - source_labels: [__meta_kubernetes_node_name] separator: ; regex: (.+) target_label: __metrics_path__ # 与节点性能指标采集唯一不同之处,URI路径不一样 replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor action: replace kubernetes_sd_configs: - role: node follow_redirects: true
- endpoints的service的监控配置(配置在service中)
apiVersion: v1 kind: Service metadata: name: prometheus-kube-state-metrics namespace: default selfLink: /api/v1/namespaces/default/services/prometheus-kube-state-metrics uid: 161fff0c-fffe-4561-90ba-0bec0608fbe4 resourceVersion: '23976299' creationTimestamp: '2021-06-01T06:51:21Z' labels: app.kubernetes.io/instance: prometheus app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: kube-state-metrics helm.sh/chart: kube-state-metrics-3.1.0 annotations: meta.helm.sh/release-name: prometheus meta.helm.sh/release-namespace: default prometheus.io/scrape: 'true'
- job_name: kubernetes-service-endpoints honor_timestamps: true scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] separator: ; regex: "true" replacement: $1 action: keep ##注意这段配置,相当重要,直接决定这个job监控那到那些endpoints监控点,这一段意思是保留匹配__meta_kubernetes_service_annotation_prometheus_io_scrape的值为true的endpoints,并删除没有匹配到标签__meta_kubernetes_service_annotation_prometheus_io_scrape且值为true的endpoints - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] separator: ; regex: (https?) target_label: __scheme__ replacement: $1 action: replace ##将__scheme__原值替换为regex的值 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] separator: ; regex: (.+) target_label: __metrics_path__ replacement: $1 action: replace ##source_lable __meta_kubernetes_service_annotation_prometheus_io_path的值赋予__metrics_path__ - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] separator: ; regex: ([^:]+)(?::d+)?;(d+) target_label: __address__ replacement: $1:$2 action: replace ## 将匹配到的__address__ __meta_kubernetes_service_annotation_prometheus_io_port的标签,根据regex的规则匹配到的值赋予__address__ - separator: ; regex: __meta_kubernetes_service_label_(.+) replacement: $1 action: labelmap ##保留.+的值为新增的label - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: kubernetes_namespace replacement: $1 action: replace ## 替换标签值,同上 - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: kubernetes_name replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_node_name] separator: ; regex: (.*) target_label: kubernetes_node replacement: $1 action: replace kubernetes_sd_configs: - role: endpoints follow_redirects: true
- service监控配置,http_2xx
- job_name: kubernetes-services honor_timestamps: true params: module: - http_2xx ##引入的模块 scrape_interval: 1m scrape_timeout: 10s metrics_path: /probe scheme: http follow_redirects: true relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] separator: ; regex: "true" replacement: $1 action: keep ## 保留匹配__meta_kubernetes_service_annotation_prometheus_io_probe并且删除没有匹配到__meta_kubernetes_service_annotation_prometheus_io_probe的endpoints - source_labels: [__address__] separator: ; regex: (.*) target_label: __param_target replacement: $1 action: replace ## 将regex的的值赋予__param_target - separator: ; regex: (.*) target_label: __address__ replacement: blackbox action: replace - source_labels: [__param_target] separator: ; regex: (.*) target_label: instance replacement: $1 action: replace ##经过上面标签重标,最终target_label instance的值就是上面的__address__的值 - separator: ; regex: __meta_kubernetes_service_label_(.+) replacement: $1 action: labelmap - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: kubernetes_namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: kubernetes_name replacement: $1 action: replace kubernetes_sd_configs: - role: service follow_redirects: true
- Pod监控配置
- job_name: kubernetes-pods honor_timestamps: true scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] separator: ; regex: "true" replacement: $1 action: keep # - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] separator: ; regex: (https?) target_label: __scheme__ replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] separator: ; regex: (.+) target_label: __metrics_path__ replacement: $1 action: replace - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] separator: ; regex: ([^:]+)(?::d+)?;(d+) target_label: __address__ replacement: $1:$2 action: replace - separator: ; regex: __meta_kubernetes_pod_label_(.+) replacement: $1 action: labelmap - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: kubernetes_namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: kubernetes_pod_name replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_phase] separator: ; regex: Pending|Succeeded|Failed replacement: $1 action: drop kubernetes_sd_configs: - role: pod follow_redirects: true
JVM RabbitMQ监控
-
JVM监控(需要在deployment.yaml指定prometheus_io_scrape标签)如下
spec: replicas: 1 selector: matchLabels: app: pre-common-gateway component: spring part-of: pre tier: backend template: metadata: creationTimestamp: null labels: app: pre-common-gateway component: spring part-of: pre tier: backend annotations: # 新增监控注释说明 prometheus.io/port: '8888' prometheus.io/scrape: 'true'
############# Jvm监控 ############## - job_name: kubernetes-pods-jvm kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_pod_container_name target_label: kubernetes_container_name - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase scrape_interval: 10s scrape_timeout: 10s
- RabbitMQ监控
以下的配置仅供参考,创建后的状态 apiVersion: v1 kind: Service metadata: name: pre-rabbitmq-monitor namespace: pre annotations: prometheus.io/scrape: rabbitmq spec: ports: - name: rabbitmq-exporter protocol: TCP port: 9419 targetPort: 9419 - name: rabbitmq-prometheus-port protocol: TCP port: 15692 targetPort: 15692 selector: app: pre-rabbitmq clusterIP: 10.11.0.90 type: ClusterIP sessionAffinity: None
############# rabbitmq监控 ############## - job_name: kubernetes-service-rabbitmq kubernetes_sd_configs: - role: endpoints relabel_configs: # 新增自定义监控标签rabbitmq - action: keep regex: rabbitmq source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ # 删除不需要监控的端口 - action: drop regex: (5672|15672) source_labels: - __meta_kubernetes_pod_container_port_number - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: labelmap regex: __meta_kubernetes_pod_label_statefulset_kubernetes_io_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: kubernetes_node
- 参考rabbitmq-exporter.yaml配置(lens直接修改)
- name: rabbitmq-exporter image: hub.qiangyun.com/rabbitmq-exporter ports: - name: mq-monitor containerPort: 9419 protocol: TCP env: - name: RABBIT_USER value: guest - name: RABBIT_PASSWORD value: guest - name: RABBIT_CAPABILITIES value: bert resources: limits: cpu: 500m memory: 1Gi livenessProbe: httpGet: path: /metrics port: 9419 scheme: HTTP initialDelaySeconds: 60 timeoutSeconds: 15 periodSeconds: 60 successThreshold: 1 failureThreshold: 3 readinessProbe: httpGet: path: /metrics port: 9419 scheme: HTTP initialDelaySeconds: 20 timeoutSeconds: 10 periodSeconds: 60 successThreshold: 1 failureThreshold: 3 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent
- rabbitmq-monitor-service.yaml
apiVersion: v1 kind: Service metadata: annotations: # 注意以下的配置与下面Prometheus的配置相呼应 prometheus.io/scrape: rabbitmq name: rabbitmq-monitor namespace: prod spec: ports: - name: rabbitmq-exporter port: 9419 protocol: TCP targetPort: 9419 - name: rabbitmq-prometheus-port port: 15692 protocol: TCP targetPort: 15692 selector: app: rabbitmq type: ClusterIP
最终参考配置
-
Prometheus完整配置
global: evaluation_interval: 15s scrape_interval: 15s scrape_timeout: 10s rule_files: - /etc/config/recording_rules.yml - /etc/config/alerting_rules.yml - /etc/config/rules - /etc/config/alerts scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ############## Kubernetes apiserver 监控配置 ############## job_name: kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default;kubernetes;https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ############## Kubernetes node 性能指标监控配置 ############## job_name: kubernetes-nodes kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - replacement: kubernetes.default.svc:443 target_label: __address__ - regex: (.+) replacement: /api/v1/nodes/$1/proxy/metrics source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ############## Kubernetes 节点Pod性能指标监控配置 ############## job_name: kubernetes-nodes-cadvisor kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - replacement: kubernetes.default.svc:443 target_label: __address__ - regex: (.+) replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true ############## Kubernetes service endpoints 监控配置 ############## - job_name: kubernetes-service-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: kubernetes_node ############## Kubernetes service endpoints RabbitMQ 监控配置 ############## - job_name: kubernetes-service-rabbitmq kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: rabbitmq source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: keep regex: (15692|9419) source_labels: - __meta_kubernetes_pod_container_port_number - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: kubernetes_node ############## 暂时没有理解这个 slow鬼 监控配置 ############## - job_name: kubernetes-service-endpoints-slow kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: kubernetes_node scrape_interval: 5m scrape_timeout: 30s - honor_labels: true ############## Prometheus pushgateway 监控配置 ############## job_name: prometheus-pushgateway kubernetes_sd_configs: - role: service relabel_configs: - action: keep regex: pushgateway source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe ############## Kubernetes service http_2xx 监控配置 ############## - job_name: kubernetes-services kubernetes_sd_configs: - role: service metrics_path: /probe params: module: - http_2xx relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe - source_labels: - __address__ target_label: __param_target - replacement: blackbox target_label: __address__ - source_labels: - __param_target target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name ############## Kubernetes 用户应用 Pod 自定义监控配置 ############## - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase ############## Kubernetes 用户应用 Pod JVM 监控配置 ############## - job_name: kubernetes-pods-jvm kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_jvm_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_jvm_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase ############## 暂时没有理解这个 slow鬼 监控配置 ############## - job_name: kubernetes-pods-slow kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase scrape_interval: 5m scrape_timeout: 30s alerting: alertmanagers: - kubernetes_sd_configs: - role: pod tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace] regex: default action: keep - source_labels: [__meta_kubernetes_pod_label_app] regex: prometheus action: keep - source_labels: [__meta_kubernetes_pod_label_component] regex: alertmanager action: keep - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_probe] regex: .* action: keep - source_labels: [__meta_kubernetes_pod_container_port_number] regex: "9093" action: keep
- ConfigMap形式
apiVersion: v1 kind: ConfigMap metadata: name: prometheus-server namespace: default labels: app: prometheus app.kubernetes.io/managed-by: Helm chart: prometheus-14.6.0 component: server heritage: Helm release: prometheus data: alerting_rules.yml: | {} alerts: | {} prometheus.yml: | global: evaluation_interval: 15s scrape_interval: 15s scrape_timeout: 10s rule_files: - /etc/config/recording_rules.yml - /etc/config/alerting_rules.yml - /etc/config/rules - /etc/config/alerts scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ############## Kubernetes apiserver 监控配置 ############## job_name: kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default;kubernetes;https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ############## Kubernetes node 性能指标监控配置 ############## job_name: kubernetes-nodes kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - replacement: kubernetes.default.svc:443 target_label: __address__ - regex: (.+) replacement: /api/v1/nodes/$1/proxy/metrics source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ############## Kubernetes 节点Pod性能指标监控配置 ############## job_name: kubernetes-nodes-cadvisor kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - replacement: kubernetes.default.svc:443 target_label: __address__ - regex: (.+) replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true ############## Kubernetes service endpoints 监控配置 ############## - job_name: kubernetes-service-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: kubernetes_node ############## Kubernetes service endpoints RabbitMQ 监控配置 ############## - job_name: kubernetes-service-rabbitmq kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: rabbitmq source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: keep regex: (15692|9419) source_labels: - __meta_kubernetes_pod_container_port_number - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: kubernetes_node ############## 暂时没有理解这个 slow鬼 监控配置 ############## - job_name: kubernetes-service-endpoints-slow kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: kubernetes_node scrape_interval: 5m scrape_timeout: 30s - honor_labels: true ############## Prometheus pushgateway 监控配置 ############## job_name: prometheus-pushgateway kubernetes_sd_configs: - role: service relabel_configs: - action: keep regex: pushgateway source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe ############## Kubernetes service http_2xx 监控配置 ############## - job_name: kubernetes-services kubernetes_sd_configs: - role: service metrics_path: /probe params: module: - http_2xx relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe - source_labels: - __address__ target_label: __param_target - replacement: blackbox target_label: __address__ - source_labels: - __param_target target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name ############## Kubernetes 用户应用 Pod 自定义监控配置 ############## - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase ############## Kubernetes 用户应用 Pod JVM 监控配置 ############## - job_name: kubernetes-pods-jvm kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_jvm_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_jvm_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_pod_container_name target_label: kubernetes_container_name - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase ############## 暂时没有理解这个 slow鬼 监控配置 ############## - job_name: kubernetes-pods-slow kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow - action: replace regex: (https?) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::d+)?;(d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name - action: drop regex: Pending|Succeeded|Failed source_labels: - __meta_kubernetes_pod_phase scrape_interval: 5m scrape_timeout: 30s alerting: alertmanagers: - kubernetes_sd_configs: - role: pod tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace] regex: default action: keep - source_labels: [__meta_kubernetes_pod_label_app] regex: prometheus action: keep - source_labels: [__meta_kubernetes_pod_label_component] regex: alertmanager action: keep - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_probe] regex: .* action: keep - source_labels: [__meta_kubernetes_pod_container_port_number] regex: "9093" action: keep recording_rules.yml: | {} rules: | {}