zoukankan      html  css  js  c++  java
  • Prometheus监控神器-Kubernetes篇(一)

    在Kubernetes中手动部署Statefulset类型的Prometheus、Alertmanager集群,并使用StorageClass来持久化数据。

    本篇使用StorageClass来持久化数据,搭建Statefulset的Prometheus联邦集群,对于数据持久化,方案众多,如Thanos、M3DB、InfluxDB、VictorMetric等,根据自己的需求​进行选择,后面会详细讲解针对数据持久化的具体细节。

    部署一个对外可以访问的Prometheus,首先要创建Prometheus所在的Namespace,然后在创建Prometheus使用的RBAC规则,创建Prometheus的 ConfigMap 来保存配置文件。
    创建SVC绑定固定集群IP,创建Statefulset有状态的Prometheus容器的Pod,最后创建Ingress 实现外部域名访问Prometheus。

    如果Kubernetes版本比较旧的话,为了便于测试,可以进行升级一下,使用 sealos 自动部署工具快速一键部署高可用集群,对于是否使用kuboard,针对自己需求去部署。

    环境

    我的本地环境使用的 sealos 一键部署,主要是为了便于测试。

    OS Kubernetes HostName IP Service
    Ubuntu 18.04 1.17.7 sealos-k8s-m1 192.168.1.151 node-exporter prometheus-federate-0
    Ubuntu 18.04 1.17.7 sealos-k8s-m2 192.168.1.152 node-exporter grafana alertmanager-0
    Ubuntu 18.04 1.17.7 sealos-k8s-m3 192.168.1.150 node-exporter alertmanager-1
    Ubuntu 18.04 1.17.7 sealos-k8s-node1 192.168.1.153 node-exporter prometheus-0 kube-state-metrics
    Ubuntu 18.04 1.17.7 sealos-k8s-node2 192.168.1.154 node-exporter prometheus-1
    Ubuntu 18.04 1.17.7 sealos-k8s-node2 192.168.1.155 node-exporter prometheus-2
    # 给master跟node加标签
    # prometheus
    kubectl label node sealos-k8s-node1 k8s-app=prometheus
    kubectl label node sealos-k8s-node2 k8s-app=prometheus
    kubectl label node sealos-k8s-node3 k8s-app=prometheus
    # federate
    kubectl label node sealos-k8s-m1 k8s-app=prometheus-federate
    # alertmanager
    kubectl label node sealos-k8s-m2 k8s-app=alertmanager
    kubectl label node sealos-k8s-m3 k8s-app=alertmanager
    
    #创建对应的部署目录
    mkdir /data/manual-deploy/ && cd /data/manual-deploy/
    mkdir alertmanager  grafana  ingress-nginx  kube-state-metrics  node-exporter  prometheus
    
    

    部署 Prometheus

    创建Prometheus的storageclass配置文件

    cat prometheus-data-storageclass.yaml
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: prometheus-lpv
    provisioner: kubernetes.io/no-provisioner
    volumeBindingMode: WaitForFirstConsumer
    

    创建Prometheus的sc的pv配置文件,同时指定了调度节点。

    # 在需要调度的Prometheus的node上创建目录与赋权
    mkdir /data/prometheus
    chown -R 65534:65534 /data/prometheus
    
    cat prometheus-federate-pv.yaml
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-lpv-0
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: prometheus-lpv
      local:
        path: /data/prometheus
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-node1
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-lpv-1
    spec:
      capacity:
        storage: 20Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: prometheus-lpv
      local:
        path: /data/prometheus
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-node2
    ---          
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-lpv-2
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: prometheus-lpv
      local:
        path: /data/prometheus
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-node3
    

    创建Prometheus的RBAC文件。

    cat prometheus-rbac.yaml
    apiVersion: rbac.authorization.k8s.io/v1 # api的version
    kind: ClusterRole # 类型
    metadata:
      name: prometheus
    rules:
    - apiGroups: [""]
      resources: # 资源
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
      verbs: ["get", "list", "watch"] 
    - apiGroups:
      - extensions
      resources:
      - ingresses
      verbs: ["get", "list", "watch"]
    - nonResourceURLs: ["/metrics"]
      verbs: ["get"]
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: prometheus # 自定义名字
      namespace: kube-system # 命名空间
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: prometheus
    roleRef: # 选择需要绑定的Role
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: cluster-admin
    subjects: # 对象
    - kind: ServiceAccount
      name: prometheus
      namespace: kube-system
    

    创建Prometheus的configmap配置文件。

    cat prometheus-configmap.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: kube-system
    data:
      prometheus.yml: |
        global:
          scrape_interval:     30s
          evaluation_interval: 30s
          external_labels:
            cluster: "01"
        scrape_configs:
        - job_name: 'kubernetes-apiservers'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
        - job_name: 'kubernetes-nodes'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics
        - job_name: 'kubernetes-cadvisor'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          metric_relabel_configs:
          - action: replace
            source_labels: [id]
            regex: '^/machine.slice/machine-rkt\x2d([^\]+)\.+/([^/]+).service$'
            target_label: rkt_container_name
            replacement: '${2}-${1}'
          - action: replace
            source_labels: [id]
            regex: '^/system.slice/(.+).service$'
            target_label: systemd_service_name
            replacement: '${1}'
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name
        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
          - source_labels: [__address__]
            action: replace
            target_label: instance
            regex: (.+):(.+)
            replacement: $1
    

    创建Prometheus的Statefulset配置文件。

    cat prometheus-statefulset.yaml
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: prometheus
      namespace: kube-system
      labels:
        k8s-app: prometheus
        kubernetes.io/cluster-service: "true"
    spec:
      serviceName: "prometheus"
      podManagementPolicy: "Parallel"
      replicas: 3
      selector:
        matchLabels:
          k8s-app: prometheus
      template:
        metadata:
          labels:
            k8s-app: prometheus
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
        spec:
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: k8s-app
                    operator: In
                    values:
                    - prometheus
                topologyKey: "kubernetes.io/hostname"
          priorityClassName: system-cluster-critical
          hostNetwork: true
          dnsPolicy: ClusterFirstWithHostNet
          containers:
          - name: prometheus-server-configmap-reload
            image: "jimmidyson/configmap-reload:v0.4.0"
            imagePullPolicy: "IfNotPresent"
            args:
              - --volume-dir=/etc/config
              - --webhook-url=http://localhost:9090/-/reload
            volumeMounts:
              - name: config-volume
                mountPath: /etc/config
                readOnly: true
            resources:
              limits:
                cpu: 10m
                memory: 10Mi
              requests:
                cpu: 10m
                memory: 10Mi
          - image: prom/prometheus:v2.20.0
            imagePullPolicy: IfNotPresent
            name: prometheus
            command:
              - "/bin/prometheus"
            args:
              - "--config.file=/etc/prometheus/prometheus.yml"
              - "--storage.tsdb.path=/prometheus"
              - "--storage.tsdb.retention=24h"
              - "--web.console.libraries=/etc/prometheus/console_libraries"
              - "--web.console.templates=/etc/prometheus/consoles"
              - "--web.enable-lifecycle"
            ports:
              - containerPort: 9090
                protocol: TCP
            volumeMounts:
              - mountPath: "/prometheus"
                name: prometheus-data
              - mountPath: "/etc/prometheus"
                name: config-volume
            readinessProbe:
              httpGet:
                path: /-/ready
                port: 9090
              initialDelaySeconds: 30
              timeoutSeconds: 30
            livenessProbe:
              httpGet:
                path: /-/healthy
                port: 9090
              initialDelaySeconds: 30
              timeoutSeconds: 30
            resources:
              requests:
                cpu: 100m
                memory: 100Mi
              limits:
                cpu: 1000m
                memory: 2500Mi
            securityContext:
                runAsUser: 65534
                privileged: true
          serviceAccountName: prometheus
          volumes:
            - name: config-volume
              configMap:
                name: prometheus-config
      volumeClaimTemplates:
        - metadata:
            name: prometheus-data
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: "prometheus-lpv"
            resources:
              requests:
                storage: 5Gi
    

    创建Prometheus的svc配置文件

    cat prometheus-service-statefulset.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus
      namespace: kube-system
    spec:
      ports:
        - name: prometheus
          port: 9090
          targetPort: 9090
      selector:
        k8s-app: prometheus
      clusterIP: None
    

    部署创建好的Prometheus的相关资源文件

    cd /data/manual-deploy/prometheus
    ls 
    prometheus-configmap.yaml # Configmap
    prometheus-data-pv.yaml # PVC
    prometheus-data-storageclass.yaml # SC
    prometheus-rbac.yaml # RBAC
    prometheus-service-statefulset.yaml # SVC
    prometheus-statefulset.yaml # Statefulset
    # 部署应用
    kubectl apply -f .
    

    验证已经部署的Prometheus的pv与pvc的绑定关系以及部署状态

    kubectl get pv
    NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS     REASON   AGE
    prometheus-lpv-0   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
    prometheus-lpv-1   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
    prometheus-lpv-2   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
    kubectl -n kube-system get pvc 
    NAME                           STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS     AGE
    prometheus-data-prometheus-0   Bound    prometheus-lpv-0   10Gi       RWO            prometheus-lpv   2m16s
    prometheus-data-prometheus-1   Bound    prometheus-lpv-2   10Gi       RWO            prometheus-lpv   2m16s
    prometheus-data-prometheus-2   Bound    prometheus-lpv-1   10Gi       RWO            prometheus-lpv   2m16s
    
    kubectl -n kube-system get pod prometheus-{0..2}
    NAME           READY   STATUS    RESTARTS   AGE
    prometheus-0   2/2     Running   0          3m16s
    prometheus-1   2/2     Running   0          3m16s
    prometheus-2   2/2     Running   0          3m16s
    
    

    部署 Node Exporter

    创建Demonset的node-exporter文件

    cd /data/manual-deploy/node-exporter/
    cat node-exporter.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: kube-system
      labels:
        k8s-app: node-exporter
    spec:
      selector:
        matchLabels:
            k8s-app: node-exporter
      template:
        metadata:
          labels:
            k8s-app: node-exporter
        spec:
          tolerations:
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
          containers:
          - image: quay.io/prometheus/node-exporter:v1.0.0
            imagePullPolicy: IfNotPresent
            name: prometheus-node-exporter
            ports:
            - containerPort: 9100
              hostPort: 9100
              protocol: TCP
              name: metrics
            volumeMounts:
            - mountPath: /host/proc
              name: proc
            - mountPath: /host/sys
              name: sys
            - mountPath: /host
              name: rootfs
            args:
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
            - --path.rootfs=/host
          volumes:
            - name: proc
              hostPath:
                path: /proc
            - name: sys
              hostPath:
                path: /sys
            - name: rootfs
              hostPath:
                path: /
          hostNetwork: true
          hostPID: true
    ---
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        prometheus.io/scrape: "true"
      labels:
        k8s-app: node-exporter
      name: node-exporter
      namespace: kube-system
    spec:
      ports:
      - name: http
        port: 9100
        protocol: TCP
      selector:
        k8s-app: node-exporter  
    

    部署

    cd /data/manual-deploy/node-exporter/
    kubectl apply -f node-exporter.yaml
    

    验证状态

    kubectl -n kube-system get pod |grep node-exporter
    node-exporter-45s2q                    2/2     Running   0          6h43m
    node-exporter-f4rrw                    2/2     Running   0          6h43m
    node-exporter-hvtzj                    2/2     Running   0          6h43m
    node-exporter-nlvfq                    2/2     Running   0          6h43m
    node-exporter-qbd2q                    2/2     Running   0          6h43m
    node-exporter-zjrh4                    2/2     Running   0          6h43m
    
    

    部署 kube-state-metrics

    kubelet已经集成了cAdvisor已知可以收集系统级别的CPU、Memory、Network、Disk、Container等指标信息,但是却不能采集到Kubernetes的资源对象的指标信息,如:Pod的数量以及状态等等。
    因此我们需要kube-state-metrics,来帮助我们完成这些采集操作。

    kube-state-metrics是通过轮询的方式对Kubernetes API进行操作,然后返回有关资源对象指标的Metrics信息:CronJob、DaemonSet、Deployment、Job、LimitRange、Node、PersistentVolume 、PersistentVolumeClaim、 Pod、Pod Disruption Budget、ReplicaSet、ReplicationController、ResourceQuota、Service、StatefulSet、Namespace、Horizontal Pod Autoscaler、Endpoint、Secret、ConfigMap、Ingress、CertificateSigningRequest

    cd /data/manual-deploy/kube-state-metrics/
    cat kube-state-metrics-rbac.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      namespace: kube-system
      name: kube-state-metrics-resizer
    rules:
    - apiGroups: [""]
      resources:
      - pods
      verbs: ["get"]
    - apiGroups: ["apps"]
      resources:
      - deployments
      resourceNames: ["kube-state-metrics"]
      verbs: ["get", "update"]
    - apiGroups: ["extensions"]
      resources:
      - deployments
      resourceNames: ["kube-state-metrics"]
      verbs: ["get", "update"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: kube-state-metrics-resizer
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-system
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kube-state-metrics
    rules:
    - apiGroups: [""]
      resources:
      - configmaps
      - secrets
      - nodes
      - pods
      - services
      - resourcequotas
      - replicationcontrollers
      - limitranges
      - persistentvolumeclaims
      - persistentvolumes
      - namespaces
      - endpoints
      verbs: ["list", "watch"]
    - apiGroups: ["extensions"]
      resources:
      - daemonsets
      - deployments
      - replicasets
      - ingresses
      verbs: ["list", "watch"]
    - apiGroups: ["apps"]
      resources:
      - daemonsets
      - deployments
      - replicasets
      - statefulsets
      verbs: ["list", "watch"]
    - apiGroups: ["batch"]
      resources:
      - cronjobs
      - jobs
      verbs: ["list", "watch"]
    - apiGroups: ["autoscaling"]
      resources:
      - horizontalpodautoscalers
      verbs: ["list", "watch"]
    - apiGroups: ["policy"]
      resources:
      - poddisruptionbudgets
      verbs: ["list", "watch"]
    - apiGroups: ["certificates.k8s.io"]
      resources:
      - certificatesigningrequests
      verbs: ["list", "watch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kube-state-metrics
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kube-state-metrics
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-system
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    

    创建kube-state-metrics的deployment文件

    cat kube-state-metrics-deloyment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          k8s-app: kube-state-metrics
      replicas: 1
      template:
        metadata:
          labels:
            k8s-app: kube-state-metrics
        spec:
          serviceAccountName: kube-state-metrics
          containers:
          - name: kube-state-metrics
            image: quay.io/coreos/kube-state-metrics:v1.6.0
            ports:
            - name: http-metrics
              containerPort: 8080
            - name: telemetry
              containerPort: 8081
            readinessProbe:
              httpGet:
                path: /healthz
                port: 8080
              initialDelaySeconds: 5
              timeoutSeconds: 5
          - name: addon-resizer
            image: k8s.gcr.io/addon-resizer:1.8.4
            resources:
              limits:
                cpu: 150m
                memory: 50Mi
              requests:
                cpu: 150m
                memory: 50Mi
            env:
              - name: MY_POD_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: MY_POD_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
            command:
              - /pod_nanny
              - --container=kube-state-metrics
              - --cpu=100m
              - --extra-cpu=1m
              - --memory=100Mi
              - --extra-memory=2Mi
              - --threshold=5
              - --deployment=kube-state-metrics
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: kube-state-metrics
      namespace: kube-system
      labels:
        k8s-app: kube-state-metrics
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      ports:
      - name: http-metrics
        port: 8080
        targetPort: http-metrics
        protocol: TCP
      - name: telemetry
        port: 8081
        targetPort: telemetry
        protocol: TCP
      selector:
        k8s-app: kube-state-metrics
    

    部署

    kubectl apply -f kube-state-metrics-rbac.yaml
    kubectl apply -f kube-state-metrics-deloyment.yaml
    

    验证

    kubectl -n kube-system get pod |grep kube-state-metrics
    kube-state-metrics-657d8d6669-bqbs8        2/2     Running   0          4h
    

    kube-state-metrics的service中指定了annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以自动发现

    kube-state-metrics在svc填写配置的时候指定annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以实现自动发现。

    部署 Alertmanager 集群

    创建目录、赋权

    k8s-m2
    mkdir /data/alertmanager
    chown -R 65534:65534 /data/alertmanager
    k8s-m3
    mkdir /data/alertmanager
    chown -R 65534:65534 /data/alertmanager 
    
    cd /data/manual-deploy/alertmanager/
    cat alertmanager-data-storageclass.yaml
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: alertmanager-lpv
    provisioner: kubernetes.io/no-provisioner
    volumeBindingMode: WaitForFirstConsumer
    

    创建Alertmanager的pv配置文件

    cat alertmanager-data-pv.yaml 
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: alertmanager-pv-0
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: alertmanager-lpv
      local:
        path: /data/alertmanager
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-m2
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: alertmanager-pv-1
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: alertmanager-lpv
      local:
        path: /data/alertmanager
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-m3
    

    创建Alertmanager的configmap配置文件

    cat alertmanager-configmap.yaml 
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: alertmanager-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      alertmanager.yml: |
        global:
          resolve_timeout: 5m
          smtp_smarthost: 'smtp.qq.com:465'
          smtp_from: 'yo@qq.com'
          smtp_auth_username: '345@qq.com'
          smtp_auth_password: 'bhgb'
          smtp_hello: '警报邮件'
          smtp_require_tls: false
        route:
          group_by: ['alertname', 'cluster']
          group_wait: 30s
          group_interval: 30s
          repeat_interval: 12h
          receiver: default
    
          routes:
          - receiver: email
            group_wait: 10s
            match:
              team: ops
        receivers:
        - name: 'default'
          email_configs:
          - to: '9935226@qq.com'
            send_resolved: true
        - name: 'email'
          email_configs:
          - to: '9935226@qq.com'
            send_resolved: true
    

    创建Alertmanager的statefulset文件,我这里部署的是集群模式,如果需要单体库模式,将replicas改为1,去掉集群参数即可。

    cat alertmanager-statefulset-cluster.yaml 
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: alertmanager
      namespace: kube-system
      labels:
        k8s-app: alertmanager
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v0.21.0
    spec:
      serviceName: "alertmanager-operated"
      replicas: 2
      selector:
        matchLabels:
          k8s-app: alertmanager
          version: v0.21.0
      template:
        metadata:
          labels:
            k8s-app: alertmanager
            version: v0.21.0
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
        spec:
          tolerations:
            - key: "CriticalAddonsOnly"
              operator: "Exists"
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: k8s-app
                    operator: In
                    values:
                    - alertmanager
                topologyKey: "kubernetes.io/hostname"
          containers:
            - name: prometheus-alertmanager
              image: "prom/alertmanager:v0.21.0"
              imagePullPolicy: "IfNotPresent"
              args:
                - "--config.file=/etc/config/alertmanager.yml"
                - "--storage.path=/data"
                - "--cluster.listen-address=${POD_IP}:9094"
                - "--web.listen-address=:9093"
                - "--cluster.peer=alertmanager-0.alertmanager-operated:9094"
                - "--cluster.peer=alertmanager-1.alertmanager-operated:9094"
              env:
                - name: NODE_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: spec.nodeName
                - name: POD_IP
                  valueFrom:
                    fieldRef:
                      fieldPath: status.podIP
                - name: POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
              ports:
                - containerPort: 9093
                  name: web
                  protocol: TCP
                - containerPort: 9094
                  name: mesh-tcp
                  protocol: TCP
                - containerPort: 9094
                  name: mesh-udp
                  protocol: UDP
              readinessProbe:
                httpGet:
                  path: /#/status
                  port: 9093
                initialDelaySeconds: 30
                timeoutSeconds: 60
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                - name: storage-volume
                  mountPath: "/data"
                  subPath: ""
              resources:
                limits:
                  cpu: 1000m
                  memory: 500Mi
                requests:
                  cpu: 10m
                  memory: 50Mi
            - name: prometheus-alertmanager-configmap-reload
              image: "jimmidyson/configmap-reload:v0.4.0"
              imagePullPolicy: "IfNotPresent"
              args:
                - --volume-dir=/etc/config
                - --webhook-url=http://localhost:9093/-/reload
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                  readOnly: true
              resources:
                limits:
                  cpu: 10m
                  memory: 10Mi
                requests:
                  cpu: 10m
                  memory: 10Mi
              securityContext:
                  runAsUser: 0
                  privileged: true
          volumes:
            - name: config-volume
              configMap:
                name: alertmanager-config
      volumeClaimTemplates:
        - metadata:
            name: storage-volume
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: "alertmanager-lpv"
            resources:
              requests:
                storage: 5Gi
    

    创建Alertmanager的operated-service配置文件

    cat alertmanager-operated-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: alertmanager-operated
      namespace: kube-system
      labels:
        app.kubernetes.io/name: alertmanager-operated
        app.kubernetes.io/component: alertmanager
    spec:
      type: ClusterIP
      clusterIP: None
      sessionAffinity: None
      selector:
        k8s-app: alertmanager
      ports:
        - name: web
          port: 9093
          protocol: TCP
          targetPort: web
        - name: tcp-mesh
          port: 9094
          protocol: TCP
          targetPort: tcp-mesh
        - name: udp-mesh
          port: 9094
          protocol: UDP
          targetPort: udp-mesh
    

    部署

    cd /data/manual-deploy/alertmanager/
    ls
    alertmanager-configmap.yaml
    alertmanager-data-pv.yaml
    alertmanager-data-storageclass.yaml
    alertmanager-operated-service.yaml
    alertmanager-service-statefulset.yaml
    alertmanager-statefulset-cluster.yaml
    kubectl apply -f .
    

    OK ,到此我们已经手动在k8s中的kube-system中以statefulset方式部署了Prometheus与Alertmanager,下一篇我们部署grafana与ingress-nginx的相关部署。

  • 相关阅读:
    CF1051F The Shortest Statement 题解
    CF819B Mister B and PR Shifts 题解
    HDU3686 Traffic Real Time Query System 题解
    HDU 5969 最大的位或 题解
    P3295 萌萌哒 题解
    BZOJ1854 连续攻击游戏 题解
    使用Python编写的对拍程序
    CF796C Bank Hacking 题解
    BZOJ2200 道路与航线 题解
    USACO07NOV Cow Relays G 题解
  • 原文地址:https://www.cnblogs.com/cloudnative/p/13636613.html
Copyright © 2011-2022 走看看