zoukankan      html  css  js  c++  java
  • 容器编排系统K8s之Prometheus监控系统+Grafana部署

      前文我们聊到了k8s的apiservice资源结合自定义apiserver扩展原生apiserver功能的相关话题,回顾请参考:https://www.cnblogs.com/qiuhom-1874/p/14279850.html;今天我们来聊一聊监控k8s集群相关话题;

      前文我们使用自定义apiserver metrics server扩展了原生apiserver的功能,让其原生apiserver能够通过kubectl top node/pod 命令来获取对应节点或名称空间下pod的cpu和内存指标数据;这些指标数据在一定程度上能够让我们清楚的知道对应pod或节点资源使用情况,本质上这也是一种监控方式;但是metrics server 采集的数据只有内存和cpu指标数据,在一定程度上不能满足我们了解节点或pod的其他数据;这样一来我们就需要有一款专业的监控系统来帮助我们监控k8s集群节点或pod;Prometheus是一款高性能的监控程序,其内部主要有3个组件,Retrieval组件主要负责数据收集工作,它可以结合外部其他程序收集数据;TSDB组件主要是用来存储指标数据,该组件是一个时间序列存储系统;HttpServer组件主要用来对外提供restful api接口,为客户端提供查询接口;默认监听在9090端口;

      prometheus监控系统整体top

      提示:上图是Prometheus监控系统的top图;Pushgateway组件类似Prometheus retrieval代理,它主要负责收集主动推送指标数据的pod的指标数据,在Prometheus 监控系统中也有主动监控和被动监控的概念,主动监控是指被监控端主动推送数据到server,被动监控是指被监控端被动等待server来拉去数据,默认情况Prometheus是工作为被动监控模式,即server主动到被监控端采集数据;节点级别metrics 数据可以使用node-exporter来收集,当然node-exporter也可以收集pod容器里的指标数据;alertmanager主要用来为Prometheus监控系统提供告警功能;Prometheus web ui主要作用是为其提供一个web查询页面;

      Prometheus 监控系统组件

      kube-state-metrics:该组件主要用来为监控k8s集群中的指标数据提供计数能力;比如k8s节点有几个,pod的数量等等;

      node-exporter:该组件主要作用是用来收集对应节点上的指标数据;

      alertmanager:该组件主要用来为Prometheus监控系统提供告警功能;

      prometheus-server:该组件主要用来存储指标数据,处理指标数据,以及为用户提供一个restful api查询接口;

      控制pod能够被Prometheus抓取数据的注解信息

      prometheus.io/scrape:该注解信息主要用来描述对应pod是否允许抓取指标数据,true表示允许,false表示不允许;

      prometheus.io/path:用于描述抓取指标数据使用的url路径,一般为/metrics

      prometheus.io/port:用于描述对应抓取指标数据使用的端口信息;

      部署Prometheus监控系统

      1、部署kube-state-metrics

      创建kube-state-metrics rbac授权相关清单

    [root@master01 kube-state-metrics]# cat kube-state-metrics-rbac.yaml 
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: kube-state-metrics
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kube-state-metrics
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    rules:
    - apiGroups: [""]
      resources:
      - configmaps
      - secrets
      - nodes
      - pods
      - services
      - resourcequotas
      - replicationcontrollers
      - limitranges
      - persistentvolumeclaims
      - persistentvolumes
      - namespaces
      - endpoints
      verbs: ["list", "watch"]
    - apiGroups: ["extensions","apps"]
      resources:
      - daemonsets
      - deployments
      - replicasets
      verbs: ["list", "watch"]
    - apiGroups: ["apps"]
      resources:
      - statefulsets
      verbs: ["list", "watch"]
    - apiGroups: ["batch"]
      resources:
      - cronjobs
      - jobs
      verbs: ["list", "watch"]
    - apiGroups: ["autoscaling"]
      resources:
      - horizontalpodautoscalers
      verbs: ["list", "watch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: kube-state-metrics-resizer
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    rules:
    - apiGroups: [""]
      resources:
      - pods
      verbs: ["get"]
    - apiGroups: ["extensions","apps"]
      resources:
      - deployments
      resourceNames: ["kube-state-metrics"]
      verbs: ["get", "update"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kube-state-metrics
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kube-state-metrics
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-system
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: kube-state-metrics
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: kube-state-metrics-resizer
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: kube-system
    [root@master01 kube-state-metrics]# 
    

      提示:上述清单主要创建了一个sa用户,和两个角色,并将sa用户绑定之对应的角色上;让其对应sa用户拥有对应角色的相关权限;

      创建kube-state-metrics service配置清单

    [root@master01 kube-state-metrics]# cat kube-state-metrics-service.yaml 
    apiVersion: v1
    kind: Service
    metadata:
      name: kube-state-metrics
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        kubernetes.io/name: "kube-state-metrics"
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      ports:
      - name: http-metrics
        port: 8080
        targetPort: http-metrics
        protocol: TCP
      - name: telemetry
        port: 8081
        targetPort: telemetry
        protocol: TCP
      selector:
        k8s-app: kube-state-metrics
    [root@master01 kube-state-metrics]# 
    

      创建kube-state-metrics 部署清单

    [root@master01 kube-state-metrics]# cat kube-state-metrics-deployment.yaml 
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kube-state-metrics
      namespace: kube-system
      labels:
        k8s-app: kube-state-metrics
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v2.0.0-beta
    spec:
      selector:
        matchLabels:
          k8s-app: kube-state-metrics
          version: v2.0.0-beta
      replicas: 1
      template:
        metadata:
          labels:
            k8s-app: kube-state-metrics
            version: v2.0.0-beta
        spec:
          priorityClassName: system-cluster-critical
          serviceAccountName: kube-state-metrics
          containers:
          - name: kube-state-metrics
            image: quay.io/coreos/kube-state-metrics:v2.0.0-beta
            ports:
            - name: http-metrics
              containerPort: 8080
            - name: telemetry
              containerPort: 8081
            readinessProbe:
              httpGet:
                path: /healthz
                port: 8080
              initialDelaySeconds: 5
              timeoutSeconds: 5
          - name: addon-resizer
            image: k8s.gcr.io/addon-resizer:1.8.7
            resources:
              limits:
                cpu: 100m
                memory: 30Mi
              requests:
                cpu: 100m
                memory: 30Mi
            env:
              - name: MY_POD_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: MY_POD_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
            volumeMounts:
              - name: config-volume
                mountPath: /etc/config
            command:
              - /pod_nanny
              - --config-dir=/etc/config
              - --container=kube-state-metrics
              - --cpu=100m
              - --extra-cpu=1m
              - --memory=100Mi
              - --extra-memory=2Mi
              - --threshold=5
              - --deployment=kube-state-metrics
          volumes:
            - name: config-volume
              configMap:
                name: kube-state-metrics-config
    ---
    # Config map for resource configuration.
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kube-state-metrics-config
      namespace: kube-system
      labels:
        k8s-app: kube-state-metrics
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    data:
      NannyConfiguration: |-
        apiVersion: nannyconfig/v1alpha1
        kind: NannyConfiguration
    
    [root@master01 kube-state-metrics]# 
    

      应用上述三个清单,部署kube-state-metrics组件

    [root@master01 kube-state-metrics]# ls
    kube-state-metrics-deployment.yaml  kube-state-metrics-rbac.yaml  kube-state-metrics-service.yaml
    [root@master01 kube-state-metrics]# kubectl apply -f .
    deployment.apps/kube-state-metrics created
    configmap/kube-state-metrics-config created
    serviceaccount/kube-state-metrics created
    clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
    role.rbac.authorization.k8s.io/kube-state-metrics-resizer created
    clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    service/kube-state-metrics created
    [root@master01 kube-state-metrics]# 
    

      验证:查看对应的pod和service是否都成功创建?

      提示:可以看到对应pod和svc都已经正常创建;

      验证:访问对应service的8080端口,url为/metrics,看看是否能够访问到数据?

      提示:可以看到访问对应service的8080端口,url为/metrics能够访问到对应数据,说明kube-state-metrics组件安装部署完成;

      2、部署node-exporter

      创建node-export service配置清单

    [root@master01 node_exporter]# cat node-exporter-service.yaml 
    apiVersion: v1
    kind: Service
    metadata:
      name: node-exporter
      namespace: kube-system
      annotations:
        prometheus.io/scrape: "true"
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        kubernetes.io/name: "NodeExporter"
    spec:
      clusterIP: None
      ports:
        - name: metrics
          port: 9100
          protocol: TCP
          targetPort: 9100
      selector:
        k8s-app: node-exporter
    [root@master01 node_exporter]# 
    

      创建node-export 部署清单

    [root@master01 node_exporter]# cat node-exporter-ds.yml 
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: kube-system
      labels:
        k8s-app: node-exporter
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v1.0.1
    spec:
      selector:
        matchLabels:
          k8s-app: node-exporter
          version: v1.0.1
      updateStrategy:
        type: OnDelete
      template:
        metadata:
          labels:
            k8s-app: node-exporter
            version: v1.0.1
        spec:
          priorityClassName: system-node-critical
          containers:
            - name: prometheus-node-exporter
              image: "prom/node-exporter:v1.0.1"
              imagePullPolicy: "IfNotPresent"
              args:
                - --path.procfs=/host/proc
                - --path.sysfs=/host/sys
              ports:
                - name: metrics
                  containerPort: 9100
                  hostPort: 9100
              volumeMounts:
                - name: proc
                  mountPath: /host/proc
                  readOnly:  true
                - name: sys
                  mountPath: /host/sys
                  readOnly: true
              resources:
                limits:
                  memory: 50Mi
                requests:
                  cpu: 100m
                  memory: 50Mi
          hostNetwork: true
          hostPID: true
          volumes:
            - name: proc
              hostPath:
                path: /proc
            - name: sys
              hostPath:
                path: /sys
          tolerations:
          - key: node-role.kubernetes.io/master
            operator: Exists
            effect: NoSchedule
          
    [root@master01 node_exporter]# 
    

      提示:上述清单主要用daemonSet控制器来运行node-exporter pod,并在对应pod上做了共享宿主机网络名称空间和pid,以及对主节点污点的容忍度;这样node-exporter就可以在k8s的所有节点上运行一个pod,通过对应pod来采集对应节点上的指标数据;

      应用上述两个配置清单部署 node-exporter

    [root@master01 node_exporter]# ls
    node-exporter-ds.yml  node-exporter-service.yaml
    [root@master01 node_exporter]# kubectl apply -f .
    daemonset.apps/node-exporter created
    service/node-exporter created
    [root@master01 node_exporter]# 
    

      验证:查看对应pod和svc是否正常创建?

    [root@master01 node_exporter]# kubectl get pods -l "k8s-app=node-exporter" -n kube-system
    NAME                  READY   STATUS    RESTARTS   AGE
    node-exporter-6zgkz   1/1     Running   0          107s
    node-exporter-9mvxr   1/1     Running   0          107s
    node-exporter-jbll7   1/1     Running   0          107s
    node-exporter-s7vvt   1/1     Running   0          107s
    node-exporter-xmrjh   1/1     Running   0          107s
    [root@master01 node_exporter]# kubectl get svc -n kube-system
    NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
    kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d
    kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        20m
    metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  46h
    node-exporter        ClusterIP   None             <none>        9100/TCP                 116s
    [root@master01 node_exporter]# 
    

      验证:访问任意节点上的9100端口,url为/metrics,看看是否能够访问到指标数据?

      提示:可以看到对应端口下/metrics url能够访问到对应的数据,说明node-exporter组件部署成功;

      3、部署alertmanager

      创建alertmanager pvc配置清单

    [root@master01 alertmanager]# cat alertmanager-pvc.yaml
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: alertmanager
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    spec:
    #  storageClassName: standard
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: "2Gi"
    [root@master01 alertmanager]# 
    

      创建pv

    [root@master01 ~]# cat pv-demo.yaml 
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: nfs-pv-v1
    spec:
      capacity:
        storage: 5Gi
      volumeMode: Filesystem
      accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]
      persistentVolumeReclaimPolicy: Retain
      mountOptions:
      - hard
      - nfsvers=4.1
      nfs:
        path: /data/v1
        server: 192.168.0.99
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: nfs-pv-v2
    spec:
      capacity:
        storage: 5Gi
      volumeMode: Filesystem
      accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]
      persistentVolumeReclaimPolicy: Retain
      mountOptions:
      - hard
      - nfsvers=4.1
      nfs:
        path: /data/v2
        server: 192.168.0.99
    ---
    
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: nfs-pv-v3
    spec:
      capacity:
        storage: 5Gi
      volumeMode: Filesystem
      accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]
      persistentVolumeReclaimPolicy: Retain
      mountOptions:
      - hard
      - nfsvers=4.1
      nfs:
        path: /data/v3
        server: 192.168.0.99
    [root@master01 ~]# 
    

      应用清单创建pv

    [root@master01 ~]# kubectl apply -f pv-demo.yaml
    persistentvolume/nfs-pv-v1 created
    persistentvolume/nfs-pv-v2 created
    persistentvolume/nfs-pv-v3 created
    [root@master01 ~]# kubectl get pv
    NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
    nfs-pv-v1   5Gi        RWO,ROX,RWX    Retain           Available                                   4s
    nfs-pv-v2   5Gi        RWO,ROX,RWX    Retain           Available                                   4s
    nfs-pv-v3   5Gi        RWO,ROX,RWX    Retain           Available                                   4s
    [root@master01 ~]# 
    

      创建alertmanager service配置清单

    [root@master01 alertmanager]# cat alertmanager-service.yaml 
    apiVersion: v1
    kind: Service
    metadata:
      name: alertmanager
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        kubernetes.io/name: "Alertmanager"
    spec:
      ports:
        - name: http
          port: 80
          protocol: TCP
          targetPort: 9093
          nodePort: 30093
      selector:
        k8s-app: alertmanager
      type: "NodePort"
    [root@master01 alertmanager]# 
    

      创建alertmanager cm配置清单

    [root@master01 alertmanager]# cat alertmanager-configmap.yaml 
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: alertmanager-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      alertmanager.yml: |
        global: null
        receivers:
        - name: default-receiver
        route:
          group_interval: 5m
          group_wait: 10s
          receiver: default-receiver
          repeat_interval: 3h
    [root@master01 alertmanager]# 
    

      创建alertmanager 部署清单

    [root@master01 alertmanager]# cat alertmanager-deployment.yaml 
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: alertmanager
      namespace: kube-system
      labels:
        k8s-app: alertmanager
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v0.14.0
    spec:
      replicas: 1
      selector:
        matchLabels:
          k8s-app: alertmanager
          version: v0.14.0
      template:
        metadata:
          labels:
            k8s-app: alertmanager
            version: v0.14.0
        spec:
          priorityClassName: system-cluster-critical
          containers:
            - name: prometheus-alertmanager
              image: "prom/alertmanager:v0.14.0"
              imagePullPolicy: "IfNotPresent"
              args:
                - --config.file=/etc/config/alertmanager.yml
                - --storage.path=/data
                - --web.external-url=/
              ports:
                - containerPort: 9093
              readinessProbe:
                httpGet:
                  path: /#/status
                  port: 9093
                initialDelaySeconds: 30
                timeoutSeconds: 30
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                - name: storage-volume
                  mountPath: "/data"
                  subPath: ""
              resources:
                limits:
                  cpu: 10m
                  memory: 50Mi
                requests:
                  cpu: 10m
                  memory: 50Mi
    #        - name: prometheus-alertmanager-configmap-reload
    #          image: "jimmidyson/configmap-reload:v0.1"
    #          imagePullPolicy: "IfNotPresent"
    #          args:
    #            - --volume-dir=/etc/config
    #            - --webhook-url=http://localhost:9093/-/reload
    #          volumeMounts:
    #            - name: config-volume
    #              mountPath: /etc/config
    #              readOnly: true
    #          resources:
    #            limits:
    #              cpu: 10m
    #              memory: 10Mi
    #            requests:
    #              cpu: 10m
    #              memory: 10Mi
          volumes:
            - name: config-volume
              configMap:
                name: alertmanager-config
            - name: storage-volume
              persistentVolumeClaim:
                claimName: alertmanager
    [root@master01 alertmanager]# 
    

      应用上述4个清单,部署alertmanager

    [root@master01 alertmanager]# ls
    alertmanager-configmap.yaml  alertmanager-deployment.yaml  alertmanager-pvc.yaml  alertmanager-service.yaml
    [root@master01 alertmanager]# kubectl apply -f .
    configmap/alertmanager-config created
    deployment.apps/alertmanager created
    persistentvolumeclaim/alertmanager created
    service/alertmanager created
    [root@master01 alertmanager]# 
    

      验证:查看对应pod和svc是否正常创建?

    [root@master01 alertmanager]# kubectl get pods -l "k8s-app=alertmanager" -n kube-system
    NAME                            READY   STATUS    RESTARTS   AGE
    alertmanager-6546bf7676-lt9jq   1/1     Running   0          85s
    [root@master01 alertmanager]# kubectl get svc -n kube-system
    NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
    alertmanager         NodePort    10.99.246.148    <none>        80:30093/TCP             92s
    kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d
    kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        31m
    metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  47h
    node-exporter        ClusterIP   None             <none>        9100/TCP                 13m
    [root@master01 alertmanager]# 
    

      验证:访问任意节点的30093端口,看看是否能够访问到alertmanager?

      提示:访问对应的端口能够访问到上述界面,说明alertmanager 部署成功;

      4、部署prometheus-server

       创建Prometheus rabc相关授权配置清单

    [root@master01 prometheus-server]# cat prometheus-rbac.yaml 
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: prometheus
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: prometheus
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    rules:
      - apiGroups:
          - ""
        resources:
          - nodes
          - nodes/metrics
          - services
          - endpoints
          - pods
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - ""
        resources:
          - configmaps
        verbs:
          - get
      - nonResourceURLs:
          - "/metrics"
        verbs:
          - get
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: prometheus
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: prometheus
    subjects:
    - kind: ServiceAccount
      name: prometheus
      namespace: kube-system
    [root@master01 prometheus-server]# 
    

      创建Prometheus service配置清单

    [root@master01 prometheus-server]# cat prometheus-service.yaml 
    kind: Service
    apiVersion: v1
    metadata:
      name: prometheus
      namespace: kube-system
      labels:
        kubernetes.io/name: "Prometheus"
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    spec:
      ports:
        - name: http
          port: 9090
          protocol: TCP
          targetPort: 9090
          nodePort: 30090
      selector:
        k8s-app: prometheus
      type: NodePort
    [root@master01 prometheus-server]# 
    

      创建Prometheus cm配置清单

    [root@master01 prometheus-server]# cat prometheus-configmap.yaml 
    # Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      prometheus.yml: |
        scrape_configs:
        - job_name: prometheus
          static_configs:
          - targets:
            - localhost:9090
    
        - job_name: kubernetes-apiservers
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: default;kubernetes;https
            source_labels:
            - __meta_kubernetes_namespace
            - __meta_kubernetes_service_name
            - __meta_kubernetes_endpoint_port_name
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        - job_name: kubernetes-nodes-kubelet
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        - job_name: kubernetes-nodes-cadvisor
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __metrics_path__
            replacement: /metrics/cadvisor
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        - job_name: kubernetes-service-endpoints
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_service_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
    
        - job_name: kubernetes-services
          kubernetes_sd_configs:
          - role: service
          metrics_path: /probe
          params:
            module:
            - http_2xx
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_probe
          - source_labels:
            - __address__
            target_label: __param_target
          - replacement: blackbox
            target_label: __address__
          - source_labels:
            - __param_target
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
    
        - job_name: kubernetes-pods
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::d+)?;(d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
        alerting:
          alertmanagers:
          - kubernetes_sd_configs:
              - role: pod
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            relabel_configs:
            - source_labels: [__meta_kubernetes_namespace]
              regex: kube-system
              action: keep
            - source_labels: [__meta_kubernetes_pod_label_k8s_app]
              regex: alertmanager
              action: keep
            - source_labels: [__meta_kubernetes_pod_container_port_number]
              regex:
              action: drop
    [root@master01 prometheus-server]# 
    

      创建Prometheus 部署清单

    [root@master01 prometheus-server]# cat prometheus-statefulset.yaml 
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: prometheus
      namespace: kube-system
      labels:
        k8s-app: prometheus
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v2.24.0
    spec:
      serviceName: "prometheus"
      replicas: 1
      podManagementPolicy: "Parallel"
      updateStrategy:
       type: "RollingUpdate"
      selector:
        matchLabels:
          k8s-app: prometheus
      template:
        metadata:
          labels:
            k8s-app: prometheus
        spec:
          priorityClassName: system-cluster-critical
          serviceAccountName: prometheus
          initContainers:
          - name: "init-chown-data"
            image: "busybox:latest"
            imagePullPolicy: "IfNotPresent"
            command: ["chown", "-R", "65534:65534", "/data"]
            volumeMounts:
            - name: prometheus-data
              mountPath: /data
              subPath: ""
          containers:
    #        - name: prometheus-server-configmap-reload
    #          image: "jimmidyson/configmap-reload:v0.1"
    #          imagePullPolicy: "IfNotPresent"
    #          args:
    #            - --volume-dir=/etc/config
    #            - --webhook-url=http://localhost:9090/-/reload
    #          volumeMounts:
    #            - name: config-volume
    #              mountPath: /etc/config
    #              readOnly: true
    #          resources:
    #            limits:
    #              cpu: 10m
    #              memory: 10Mi
    #            requests:
    #              cpu: 10m
    #              memory: 10Mi
    
            - name: prometheus-server
              image: "prom/prometheus:v2.24.0"
              imagePullPolicy: "IfNotPresent"
              args:
                - --config.file=/etc/config/prometheus.yml
                - --storage.tsdb.path=/data
                - --web.console.libraries=/etc/prometheus/console_libraries
                - --web.console.templates=/etc/prometheus/consoles
                - --web.enable-lifecycle
              ports:
                - containerPort: 9090
              readinessProbe:
                httpGet:
                  path: /-/ready
                  port: 9090
                initialDelaySeconds: 30
                timeoutSeconds: 30
              livenessProbe:
                httpGet:
                  path: /-/healthy
                  port: 9090
                initialDelaySeconds: 30
                timeoutSeconds: 30
              # based on 10 running nodes with 30 pods each
              resources:
                limits:
                  cpu: 200m
                  memory: 1000Mi
                requests:
                  cpu: 200m
                  memory: 1000Mi
    
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                - name: prometheus-data
                  mountPath: /data
                  subPath: ""
          terminationGracePeriodSeconds: 300
          volumes:
            - name: config-volume
              configMap:
                name: prometheus-config
      volumeClaimTemplates:
      - metadata:
          name: prometheus-data
        spec:
    #      storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: "5Gi"
    [root@master01 prometheus-server]# 
    

      提示:应用上述清单前,请确保对应pv容量是否够用;

      应用上述4个清单部署Prometheus server

    [root@master01 prometheus-server]# ls
    prometheus-configmap.yaml  prometheus-rbac.yaml  prometheus-service.yaml  prometheus-statefulset.yaml
    [root@master01 prometheus-server]# kubectl apply -f .
    configmap/prometheus-config created
    serviceaccount/prometheus created
    clusterrole.rbac.authorization.k8s.io/prometheus created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus created
    service/prometheus created
    statefulset.apps/prometheus created
    [root@master01 prometheus-server]# 
    

      验证:查看对应pod和svc是否成功创建?

    [root@master01 prometheus-server]# kubectl get pods -l "k8s-app=prometheus" -n kube-system
    NAME           READY   STATUS    RESTARTS   AGE
    prometheus-0   1/1     Running   0          2m20s
    [root@master01 prometheus-server]# kubectl get svc -n kube-system
    NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
    alertmanager         NodePort    10.99.246.148    <none>        80:30093/TCP             10m
    kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d
    kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        40m
    metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  47h
    node-exporter        ClusterIP   None             <none>        9100/TCP                 22m
    prometheus           NodePort    10.111.155.1     <none>        9090:30090/TCP           2m27s
    [root@master01 prometheus-server]# 
    

      验证:访问任意节点的30090端口,看看对应Prometheus 是否能够被访问?

      提示:能够访问到上述页面,表示Prometheus server部署没有问题;

      通过上述界面查看监控指标数据

      提示:选择对应要查看的指标数据项,点击execute,对应图像就会呈现出来;到此Prometheus监控系统就部署完成了,接下来部署grafana,并配置grafana使用Prometheus数据源展示监控数据;

      部署grafana

      创建grafana 部署清单

    [root@master01 grafana]# cat grafana.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: monitoring-grafana
      namespace: kube-system
    spec:
      replicas: 1
      selector:
        matchLabels:
          task: monitoring
          k8s-app: grafana
      template:
        metadata:
          labels:
            task: monitoring
            k8s-app: grafana
        spec:
          containers:
          - name: grafana
            image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
            ports:
            - containerPort: 3000
              protocol: TCP
            volumeMounts:
            - mountPath: /etc/ssl/certs
              name: ca-certificates
              readOnly: true
            - mountPath: /var
              name: grafana-storage
            env:
    #        - name: INFLUXDB_HOST
    #          value: monitoring-influxdb
            - name: GF_SERVER_HTTP_PORT
              value: "3000"
            - name: GF_AUTH_BASIC_ENABLED
              value: "false"
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: "true"
            - name: GF_AUTH_ANONYMOUS_ORG_ROLE
              value: Admin
            - name: GF_SERVER_ROOT_URL
              value: /
          volumes:
          - name: ca-certificates
            hostPath:
              path: /etc/ssl/certs
          - name: grafana-storage
            emptyDir: {}
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        kubernetes.io/cluster-service: 'true'
        kubernetes.io/name: monitoring-grafana
      name: monitoring-grafana
      namespace: kube-system
    spec:
      ports:
      - port: 80
        targetPort: 3000
      selector:
        k8s-app: grafana
      type: "NodePort"
    [root@master01 grafana]# 
    

      应用资源清单 部署grafana

    [root@master01 grafana]# ls
    grafana.yaml
    [root@master01 grafana]# kubectl apply -f .
    deployment.apps/monitoring-grafana created
    service/monitoring-grafana created
    [root@master01 grafana]# 
    

      验证:查看对应pod和svc是否都创建?

    [root@master01 grafana]# kubectl get pods -l "k8s-app=grafana" -n kube-system
    NAME                                  READY   STATUS    RESTARTS   AGE
    monitoring-grafana-6c74ccc5dd-grjzf   1/1     Running   0          87s
    [root@master01 grafana]# kubectl get svc -n kube-system
    NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
    alertmanager         NodePort    10.99.246.148    <none>        80:30093/TCP             82m
    kube-dns             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   39d
    kube-state-metrics   ClusterIP   10.110.110.216   <none>        8080/TCP,8081/TCP        112m
    metrics-server       ClusterIP   10.98.59.116     <none>        443/TCP                  2d
    monitoring-grafana   NodePort    10.100.230.71    <none>        80:30196/TCP             92s
    node-exporter        ClusterIP   None             <none>        9100/TCP                 94m
    prometheus           NodePort    10.111.155.1     <none>        9090:30090/TCP           74m
    [root@master01 grafana]# 
    

      提示:可以看到grafana svc暴露了30196端口;

      验证:访问grafana service 暴露的端口,看看对应pod是否能够被访问?

      提示:能够访问到上述页面,表示grafana部署成功;

      配置grafana

      1、配置grafana的数据源为Prometheus

      2、新建监控面板

      提示:进入grafana.com网站上,下载监控面板模板;

      下载好模板文件以后,导入模板文件到grafana

      提示:选择下载的模板文件,然后再选择对应的数据源,点击import即可;上面没有数据的原因是对应指标名称和Prometheus中指标名称不同导致的;我们可以根据自己环境Prometheus中指标数据名称来修改模板文件;

    作者:Linux-1874
    本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利.
  • 相关阅读:
    数组(array)
    亲戚(relative)
    [ZJOI2016]小星星
    P4782 【模板】2-SAT 问题
    CF1065F Up and Down the Tree
    CF1065C Make It Equal
    CF1060F Shrinking Tree
    CF1060E Sergey and Subway(点分治)
    CF1060D Social Circles
    CF1060C Maximum Subrectangle
  • 原文地址:https://www.cnblogs.com/qiuhom-1874/p/14287942.html
Copyright © 2011-2022 走看看