zoukankan      html  css  js  c++  java
  • Prometheus监控神器-Kubernetes篇(一)

    在Kubernetes中手动部署Statefulset类型的Prometheus、Alertmanager集群,并使用StorageClass来持久化数据。

    本篇使用StorageClass来持久化数据,搭建Statefulset的Prometheus联邦集群,对于数据持久化,方案众多,如Thanos、M3DB、InfluxDB、VictorMetric等,根据自己的需求​进行选择,后面会详细讲解针对数据持久化的具体细节。

    部署一个对外可以访问的Prometheus,首先要创建Prometheus所在的Namespace,然后在创建Prometheus使用的RBAC规则,创建Prometheus的 ConfigMap 来保存配置文件。
    创建SVC绑定固定集群IP,创建Statefulset有状态的Prometheus容器的Pod,最后创建Ingress 实现外部域名访问Prometheus。

    如果Kubernetes版本比较旧的话,为了便于测试,可以进行升级一下,使用 sealos 自动部署工具快速一键部署高可用集群,对于是否使用kuboard,针对自己需求去部署。

    环境

    我的本地环境使用的 sealos 一键部署,主要是为了便于测试。

    OS Kubernetes HostName IP Service
    Ubuntu 18.04 1.17.7 sealos-k8s-m1 192.168.1.151 node-exporter prometheus-federate-0
    Ubuntu 18.04 1.17.7 sealos-k8s-m2 192.168.1.152 node-exporter grafana alertmanager-0
    Ubuntu 18.04 1.17.7 sealos-k8s-m3 192.168.1.150 node-exporter alertmanager-1
    Ubuntu 18.04 1.17.7 sealos-k8s-node1 192.168.1.153 node-exporter prometheus-0 kube-state-metrics
    Ubuntu 18.04 1.17.7 sealos-k8s-node2 192.168.1.154 node-exporter prometheus-1
    Ubuntu 18.04 1.17.7 sealos-k8s-node2 192.168.1.155 node-exporter prometheus-2
    # 给master跟node加标签
    # prometheus
    kubectl label node sealos-k8s-node1 k8s-app=prometheus
    kubectl label node sealos-k8s-node2 k8s-app=prometheus
    kubectl label node sealos-k8s-node3 k8s-app=prometheus
    # federate
    kubectl label node sealos-k8s-m1 k8s-app=prometheus-federate
    # alertmanager
    kubectl label node sealos-k8s-m2 k8s-app=alertmanager
    kubectl label node sealos-k8s-m3 k8s-app=alertmanager
    
    #创建对应的部署目录
    mkdir /data/manual-deploy/ && cd /data/manual-deploy/
    mkdir alertmanager  grafana  ingress-nginx  kube-state-metrics  node-exporter  prometheus
    
    

    部署 Prometheus

    创建Prometheus的storageclass配置文件

    cat prometheus-data-storageclass.yaml
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: prometheus-lpv
    provisioner: kubernetes.io/no-provisioner
    volumeBindingMode: WaitForFirstConsumer
    

    创建Prometheus的sc的pv配置文件,同时指定了调度节点。

    # 在需要调度的Prometheus的node上创建目录与赋权
    mkdir /data/prometheus
    chown -R 65534:65534 /data/prometheus
    
    cat prometheus-federate-pv.yaml
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-lpv-0
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: prometheus-lpv
      local:
        path: /data/prometheus
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-node1
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-lpv-1
    spec:
      capacity:
        storage: 20Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: prometheus-lpv
      local:
        path: /data/prometheus
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-node2
    ---          
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-lpv-2
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: prometheus-lpv
      local:
        path: /data/prometheus
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-node3
    

    创建Prometheus的RBAC文件。

    cat prometheus-rbac.yaml
    apiVersion: rbac.authorization.k8s.io/v1 # api的version
    kind: ClusterRole # 类型
    metadata:
      name: prometheus
    rules:
    - apiGroups: [""]
      resources: # 资源
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
      verbs: ["get", "list", "watch"] 
    - apiGroups:
      - extensions
      resources:
      - ingresses
      verbs: ["get", "list", "watch"]
    - nonResourceURLs: ["/metrics"]
      verbs: ["get"]
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: prometheus # 自定义名字
      namespace: kube-system # 命名空间
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: prometheus
    roleRef: # 选择需要绑定的Role
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: cluster-admin
    subjects: # 对象
    - kind: ServiceAccount
      name: prometheus
      namespace: kube-system
    

    创建Prometheus的configmap配置文件。

    cat prometheus-configmap.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: kube-system
    data:
      prometheus.yml: |
        global:
          scrape_interval:     30s
          evaluation_interval: 30s
          external_labels:
            cluster: "01"
        scrape_configs:
        - job_name: 'kubernetes-apiservers'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
        - job_name: 'kubernetes-nodes'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics
        - job_name: 'kubernetes-cadvisor'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          metric_relabel_configs:
          - action: replace
            source_labels: [id]
            regex: '^/machine.slice/machine-rkt\x2d([^\]+)\.+/([^/]+).service$'
            target_label: rkt_container_name
            replacement: '${2}-${1}'
          - action: replace
            source_labels: [id]
            regex: '^/system.slice/(.+).service$'
            target_label: systemd_service_name
            replacement: '${1}'
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name
        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
          - source_labels: [__address__]
            action: replace
            target_label: instance
            regex: (.+):(.+)
            replacement: $1
    

    创建Prometheus的Statefulset配置文件。

    cat prometheus-statefulset.yaml
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: prometheus
      namespace: kube-system
      labels:
        k8s-app: prometheus
        kubernetes.io/cluster-service: "true"
    spec:
      serviceName: "prometheus"
      podManagementPolicy: "Parallel"
      replicas: 3
      selector:
        matchLabels:
          k8s-app: prometheus
      template:
        metadata:
          labels:
            k8s-app: prometheus
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
        spec:
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: k8s-app
                    operator: In
                    values:
                    - prometheus
                topologyKey: "kubernetes.io/hostname"
          priorityClassName: system-cluster-critical
          hostNetwork: true
          dnsPolicy: ClusterFirstWithHostNet
          containers:
          - name: prometheus-server-configmap-reload
            image: "jimmidyson/configmap-reload:v0.4.0"
            imagePullPolicy: "IfNotPresent"
            args:
              - --volume-dir=/etc/config
              - --webhook-url=http://localhost:9090/-/reload
            volumeMounts:
              - name: config-volume
                mountPath: /etc/config
                readOnly: true
            resources:
              limits:
                cpu: 10m
                memory: 10Mi
              requests:
                cpu: 10m
                memory: 10Mi
          - image: prom/prometheus:v2.20.0
            imagePullPolicy: IfNotPresent
            name: prometheus
            command:
              - "/bin/prometheus"
            args:
              - "--config.file=/etc/prometheus/prometheus.yml"
              - "--storage.tsdb.path=/prometheus"
              - "--storage.tsdb.retention=24h"
              - "--web.console.libraries=/etc/prometheus/console_libraries"
              - "--web.console.templates=/etc/prometheus/consoles"
              - "--web.enable-lifecycle"
            ports:
              - containerPort: 9090
                protocol: TCP
            volumeMounts:
              - mountPath: "/prometheus"
                name: prometheus-data
              - mountPath: "/etc/prometheus"
                name: config-volume
            readinessProbe:
              httpGet:
                path: /-/ready
                port: 9090
              initialDelaySeconds: 30
              timeoutSeconds: 30
            livenessProbe:
              httpGet:
                path: /-/healthy
                port: 9090
              initialDelaySeconds: 30
              timeoutSeconds: 30
            resources:
              requests:
                cpu: 100m
                memory: 100Mi
              limits:
                cpu: 1000m
                memory: 2500Mi
            securityContext:
                runAsUser: 65534
                privileged: true
          serviceAccountName: prometheus
          volumes:
            - name: config-volume
              configMap:
                name: prometheus-config
      volumeClaimTemplates:
        - metadata:
            name: prometheus-data
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: "prometheus-lpv"
            resources:
              requests:
                storage: 5Gi
    

    创建Prometheus的svc配置文件

    cat prometheus-service-statefulset.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus
      namespace: kube-system
    spec:
      ports:
        - name: prometheus
          port: 9090
          targetPort: 9090
      selector:
        k8s-app: prometheus
      clusterIP: None
    

    部署创建好的Prometheus的相关资源文件

    cd /data/manual-deploy/prometheus
    ls 
    prometheus-configmap.yaml # Configmap
    prometheus-data-pv.yaml # PVC
    prometheus-data-storageclass.yaml # SC
    prometheus-rbac.yaml # RBAC
    prometheus-service-statefulset.yaml # SVC
    prometheus-statefulset.yaml # Statefulset
    # 部署应用
    kubectl apply -f .
    

    验证已经部署的Prometheus的pv与pvc的绑定关系以及部署状态

    kubectl get pv
    NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS     REASON   AGE
    prometheus-lpv-0   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
    prometheus-lpv-1   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
    prometheus-lpv-2   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
    kubectl -n kube-system get pvc 
    NAME                           STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS     AGE
    prometheus-data-prometheus-0   Bound    prometheus-lpv-0   10Gi       RWO            prometheus-lpv   2m16s
    prometheus-data-prometheus-1   Bound    prometheus-lpv-2   10Gi       RWO            prometheus-lpv   2m16s
    prometheus-data-prometheus-2   Bound    prometheus-lpv-1   10Gi       RWO            prometheus-lpv   2m16s
    
    kubectl -n kube-system get pod prometheus-{0..2}
    NAME           READY   STATUS    RESTARTS   AGE
    prometheus-0   2/2     Running   0          3m16s
    prometheus-1   2/2     Running   0          3m16s
    prometheus-2   2/2     Running   0          3m16s
    
    

    部署 Node Exporter

    创建Demonset的node-exporter文件

    cd /data/manual-deploy/node-exporter/
    cat node-exporter.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: kube-system
      labels:
        k8s-app: node-exporter
    spec:
      selector:
        matchLabels:
            k8s-app: node-exporter
      template:
        metadata:
          labels:
            k8s-app: node-exporter
        spec:
          tolerations:
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
          containers:
          - image: quay.io/prometheus/node-exporter:v1.0.0
            imagePullPolicy: IfNotPresent
            name: prometheus-node-exporter
            ports:
            - containerPort: 9100
              hostPort: 9100
              protocol: TCP
              name: metrics
            volumeMounts:
            - mountPath: /host/proc
              name: proc
            - mountPath: /host/sys
              name: sys
            - mountPath: /host
              name: rootfs
            args:
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
            - --path.rootfs=/host
          volumes:
            - name: proc
              hostPath:
                path: /proc
            - name: sys
              hostPath:
                path: /sys
            - name: rootfs
              hostPath:
                path: /
          hostNetwork: true
          hostPID: true
    ---
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        prometheus.io/scrape: "true"
      labels:
        k8s-app: node-exporter
      name: node-exporter
      namespace: kube-system
    spec:
      ports:
      - name: http
        port: 9100
        protocol: TCP
      selector:
        k8s-app: node-exporter  
    

    部署

    cd /data/manual-deploy/node-exporter/
    kubectl apply -f node-exporter.yaml
    

    验证状态

    kubectl -n kube-system get pod |grep node-exporter
    node-exporter-45s2q                    2/2     Running   0          6h43m
    node-exporter-f4rrw                    2/2     Running   0          6h43m
    node-exporter-hvtzj                    2/2     Running   0          6h43m
    node-exporter-nlvfq                    2/2     Running   0          6h43m
    node-exporter-qbd2q                    2/2     Running   0          6h43m
    node-exporter-zjrh4                    2/2     Running   0          6h43m
    
    

    部署 kube-state-metrics

    kubelet已经集成了cAdvisor已知可以收集系统级别的CPU、Memory、Network、Disk、Container等指标信息,但是却不能采集到Kubernetes的资源对象的指标信息,如:Pod的数量以及状态等等。
    因此我们需要kube-state-metrics,来帮助我们完成这些采集操作。

    kube-state-metrics是通过轮询的方式对Kubernetes API进行操作,然后返回有关资源对象指标的Metrics信息:CronJob、DaemonSet、Deployment、Job、LimitRange、Node、PersistentVolume 、PersistentVolumeClaim、 Pod、Pod Disruption Budget、ReplicaSet、ReplicationController、ResourceQuota、Service、StatefulSet、Namespace、Horizontal Pod Autoscaler、Endpoint、Secret、ConfigMap、Ingress、CertificateSigningRequest

    cd /data/manual-deploy/kube-state-metrics/
    cat kube-state-metrics-rbac.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      namespace: kube-system
      name: kube-state-metrics-resizer
    rules:
    - apiGroups: [""]
      resources:
      - pods
      verbs: ["get"]
    - apiGroups: ["apps"]
      resources:
      - deployments
      resourceNames: ["kube-state-metrics"]
      verbs: ["get", "update"]
    - apiGroups: ["extensions"]
      resources:
      - deployments
      resourceNames: ["kube-state-metrics"]
      verbs: ["get", "update"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: kube-state-metrics-resizer
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-system
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kube-state-metrics
    rules:
    - apiGroups: [""]
      resources:
      - configmaps
      - secrets
      - nodes
      - pods
      - services
      - resourcequotas
      - replicationcontrollers
      - limitranges
      - persistentvolumeclaims
      - persistentvolumes
      - namespaces
      - endpoints
      verbs: ["list", "watch"]
    - apiGroups: ["extensions"]
      resources:
      - daemonsets
      - deployments
      - replicasets
      - ingresses
      verbs: ["list", "watch"]
    - apiGroups: ["apps"]
      resources:
      - daemonsets
      - deployments
      - replicasets
      - statefulsets
      verbs: ["list", "watch"]
    - apiGroups: ["batch"]
      resources:
      - cronjobs
      - jobs
      verbs: ["list", "watch"]
    - apiGroups: ["autoscaling"]
      resources:
      - horizontalpodautoscalers
      verbs: ["list", "watch"]
    - apiGroups: ["policy"]
      resources:
      - poddisruptionbudgets
      verbs: ["list", "watch"]
    - apiGroups: ["certificates.k8s.io"]
      resources:
      - certificatesigningrequests
      verbs: ["list", "watch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kube-state-metrics
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kube-state-metrics
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-system
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    

    创建kube-state-metrics的deployment文件

    cat kube-state-metrics-deloyment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kube-state-metrics
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          k8s-app: kube-state-metrics
      replicas: 1
      template:
        metadata:
          labels:
            k8s-app: kube-state-metrics
        spec:
          serviceAccountName: kube-state-metrics
          containers:
          - name: kube-state-metrics
            image: quay.io/coreos/kube-state-metrics:v1.6.0
            ports:
            - name: http-metrics
              containerPort: 8080
            - name: telemetry
              containerPort: 8081
            readinessProbe:
              httpGet:
                path: /healthz
                port: 8080
              initialDelaySeconds: 5
              timeoutSeconds: 5
          - name: addon-resizer
            image: k8s.gcr.io/addon-resizer:1.8.4
            resources:
              limits:
                cpu: 150m
                memory: 50Mi
              requests:
                cpu: 150m
                memory: 50Mi
            env:
              - name: MY_POD_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: MY_POD_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
            command:
              - /pod_nanny
              - --container=kube-state-metrics
              - --cpu=100m
              - --extra-cpu=1m
              - --memory=100Mi
              - --extra-memory=2Mi
              - --threshold=5
              - --deployment=kube-state-metrics
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: kube-state-metrics
      namespace: kube-system
      labels:
        k8s-app: kube-state-metrics
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      ports:
      - name: http-metrics
        port: 8080
        targetPort: http-metrics
        protocol: TCP
      - name: telemetry
        port: 8081
        targetPort: telemetry
        protocol: TCP
      selector:
        k8s-app: kube-state-metrics
    

    部署

    kubectl apply -f kube-state-metrics-rbac.yaml
    kubectl apply -f kube-state-metrics-deloyment.yaml
    

    验证

    kubectl -n kube-system get pod |grep kube-state-metrics
    kube-state-metrics-657d8d6669-bqbs8        2/2     Running   0          4h
    

    kube-state-metrics的service中指定了annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以自动发现

    kube-state-metrics在svc填写配置的时候指定annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以实现自动发现。

    部署 Alertmanager 集群

    创建目录、赋权

    k8s-m2
    mkdir /data/alertmanager
    chown -R 65534:65534 /data/alertmanager
    k8s-m3
    mkdir /data/alertmanager
    chown -R 65534:65534 /data/alertmanager 
    
    cd /data/manual-deploy/alertmanager/
    cat alertmanager-data-storageclass.yaml
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: alertmanager-lpv
    provisioner: kubernetes.io/no-provisioner
    volumeBindingMode: WaitForFirstConsumer
    

    创建Alertmanager的pv配置文件

    cat alertmanager-data-pv.yaml 
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: alertmanager-pv-0
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: alertmanager-lpv
      local:
        path: /data/alertmanager
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-m2
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: alertmanager-pv-1
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: alertmanager-lpv
      local:
        path: /data/alertmanager
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - sealos-k8s-m3
    

    创建Alertmanager的configmap配置文件

    cat alertmanager-configmap.yaml 
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: alertmanager-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      alertmanager.yml: |
        global:
          resolve_timeout: 5m
          smtp_smarthost: 'smtp.qq.com:465'
          smtp_from: 'yo@qq.com'
          smtp_auth_username: '345@qq.com'
          smtp_auth_password: 'bhgb'
          smtp_hello: '警报邮件'
          smtp_require_tls: false
        route:
          group_by: ['alertname', 'cluster']
          group_wait: 30s
          group_interval: 30s
          repeat_interval: 12h
          receiver: default
    
          routes:
          - receiver: email
            group_wait: 10s
            match:
              team: ops
        receivers:
        - name: 'default'
          email_configs:
          - to: '9935226@qq.com'
            send_resolved: true
        - name: 'email'
          email_configs:
          - to: '9935226@qq.com'
            send_resolved: true
    

    创建Alertmanager的statefulset文件,我这里部署的是集群模式,如果需要单体库模式,将replicas改为1,去掉集群参数即可。

    cat alertmanager-statefulset-cluster.yaml 
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: alertmanager
      namespace: kube-system
      labels:
        k8s-app: alertmanager
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v0.21.0
    spec:
      serviceName: "alertmanager-operated"
      replicas: 2
      selector:
        matchLabels:
          k8s-app: alertmanager
          version: v0.21.0
      template:
        metadata:
          labels:
            k8s-app: alertmanager
            version: v0.21.0
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
        spec:
          tolerations:
            - key: "CriticalAddonsOnly"
              operator: "Exists"
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: k8s-app
                    operator: In
                    values:
                    - alertmanager
                topologyKey: "kubernetes.io/hostname"
          containers:
            - name: prometheus-alertmanager
              image: "prom/alertmanager:v0.21.0"
              imagePullPolicy: "IfNotPresent"
              args:
                - "--config.file=/etc/config/alertmanager.yml"
                - "--storage.path=/data"
                - "--cluster.listen-address=${POD_IP}:9094"
                - "--web.listen-address=:9093"
                - "--cluster.peer=alertmanager-0.alertmanager-operated:9094"
                - "--cluster.peer=alertmanager-1.alertmanager-operated:9094"
              env:
                - name: NODE_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: spec.nodeName
                - name: POD_IP
                  valueFrom:
                    fieldRef:
                      fieldPath: status.podIP
                - name: POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
              ports:
                - containerPort: 9093
                  name: web
                  protocol: TCP
                - containerPort: 9094
                  name: mesh-tcp
                  protocol: TCP
                - containerPort: 9094
                  name: mesh-udp
                  protocol: UDP
              readinessProbe:
                httpGet:
                  path: /#/status
                  port: 9093
                initialDelaySeconds: 30
                timeoutSeconds: 60
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                - name: storage-volume
                  mountPath: "/data"
                  subPath: ""
              resources:
                limits:
                  cpu: 1000m
                  memory: 500Mi
                requests:
                  cpu: 10m
                  memory: 50Mi
            - name: prometheus-alertmanager-configmap-reload
              image: "jimmidyson/configmap-reload:v0.4.0"
              imagePullPolicy: "IfNotPresent"
              args:
                - --volume-dir=/etc/config
                - --webhook-url=http://localhost:9093/-/reload
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                  readOnly: true
              resources:
                limits:
                  cpu: 10m
                  memory: 10Mi
                requests:
                  cpu: 10m
                  memory: 10Mi
              securityContext:
                  runAsUser: 0
                  privileged: true
          volumes:
            - name: config-volume
              configMap:
                name: alertmanager-config
      volumeClaimTemplates:
        - metadata:
            name: storage-volume
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: "alertmanager-lpv"
            resources:
              requests:
                storage: 5Gi
    

    创建Alertmanager的operated-service配置文件

    cat alertmanager-operated-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: alertmanager-operated
      namespace: kube-system
      labels:
        app.kubernetes.io/name: alertmanager-operated
        app.kubernetes.io/component: alertmanager
    spec:
      type: ClusterIP
      clusterIP: None
      sessionAffinity: None
      selector:
        k8s-app: alertmanager
      ports:
        - name: web
          port: 9093
          protocol: TCP
          targetPort: web
        - name: tcp-mesh
          port: 9094
          protocol: TCP
          targetPort: tcp-mesh
        - name: udp-mesh
          port: 9094
          protocol: UDP
          targetPort: udp-mesh
    

    部署

    cd /data/manual-deploy/alertmanager/
    ls
    alertmanager-configmap.yaml
    alertmanager-data-pv.yaml
    alertmanager-data-storageclass.yaml
    alertmanager-operated-service.yaml
    alertmanager-service-statefulset.yaml
    alertmanager-statefulset-cluster.yaml
    kubectl apply -f .
    

    OK ,到此我们已经手动在k8s中的kube-system中以statefulset方式部署了Prometheus与Alertmanager,下一篇我们部署grafana与ingress-nginx的相关部署。

  • 相关阅读:
    Docker部署Tomcat实例
    Redis+Sentinel 实现redis集群高可用
    Jenkins+Maven+SVN
    Python的装饰器
    执行hadoop fs -ls时出现错误RuntimeException: core-site.xml not found
    hbase的存储体系
    Sqoop import加载HBase过程中,遇到Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
    Java操作hbase总结
    Java文件操作 读写操作
    HBase 学习之一 <<HBase使用客户端API动态创建Hbase数据表并在Hbase下导出执行>>
  • 原文地址:https://www.cnblogs.com/cloudnative/p/13636613.html
Copyright © 2011-2022 走看看