zoukankan      html  css  js  c++  java
  • K8S从入门到放弃系列-(16)Kubernetes集群Prometheus-operator监控部署

    Prometheus Operator不同于Prometheus,Prometheus Operator是 CoreOS 开源的一套用于管理在 Kubernetes 集群上的 Prometheus 控制器,它是为了简化在 Kubernetes 上部署、管理和运行 Prometheus 和 Alertmanager 集群。

    官方提供的架构图:

     kubernetes也在官方的github上关于使用prometheus监控的建议:

    地址:https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/prometheus

      

    相关服务说明:

      Operator: Operator 资源会根据自定义资源(Custom Resource Definition / CRDs)来部署和管理 Prometheus Server,同时监控这些自定义资源事件的变化来做相应的处理,是整个系统的控制中心。
      Prometheus: Prometheus 资源是声明性地描述 Prometheus 部署的期望状态。
      Prometheus Server: Operator 根据自定义资源 Prometheus 类型中定义的内容而部署的 Prometheus Server 集群,这些自定义资源可以看作是用来管理 Prometheus Server 集群的 StatefulSets 资源。
      ServiceMonitor: ServiceMonitor 也是一个自定义资源,它描述了一组被 Prometheus 监控的 targets 列表。该资源通过 Labels 来选取对应的 Service Endpoint,让 Prometheus Server 通过选取的 Service 来获取 Metrics 信息。
      Service: Service 资源主要用来对应 Kubernetes 集群中的 Metrics Server Pod,来提供给 ServiceMonitor 选取让 Prometheus Server 来获取信息。简单的说就是 Prometheus 监控的对象,例如之前了解的 Node Exporter Service、Mysql Exporter Service 等等。
    Alertmanager: Alertmanager 也是一个自定义资源类型,由 Operator 根据资源描述内容来部署 Alertmanager 集群

    1、下载配置文件

    官方地址:https://github.com/coreos/kube-prometheus

    因为整个项目并没有多大,这里我把整个项目克隆下来,你也可以下载单独的文件,把https://github.com/coreos/kube-prometheus/tree/master/manifests下面的文件全部下载到本地。

    [root@k8s-master01 k8s]# git clone https://github.com/coreos/kube-prometheus.git

     2、部署

       2.1 配置清单yml文件归档

    ## 因为官方把所有资源配置文件都放到一个文件目录下,这里我们为了方便,把不同服务的清单文件分别归档
    [root@k8s-master01 ~]# cd /opt/k8s/kube-prometheus/manifests
    [root@k8s-master01 manifests]# mkdir serviceMonitor operator grafana kube-state-metrics alertmanager node-exporter adapter prometheus [root@k8s
    -master01 manifests]# mv *-serviceMonitor* serviceMonitor/ [root@k8s-master01 manifests]# mv 0prometheus-operator* operator/ [root@k8s-master01 manifests]# mv grafana-* grafana/ [root@k8s-master01 manifests]# mv kube-state-metrics-* kube-state-metrics/ [root@k8s-master01 manifests]# mv alertmanager-* alertmanager/ [root@k8s-master01 manifests]# mv node-exporter-* node-exporter/ [root@k8s-master01 manifests]# mv prometheus-adapter-* adapter/ [root@k8s-master01 manifests]# mv prometheus-* prometheus/

       2.2 部署operator

    ## 首先创建prometheus监控专有命名空间
    [root@k8s-master01 manifests]# kubectl apply -f 00namespace-namespace.yaml
    ## 部署operator
    [root@k8s-master01 manifests]# kubectl apply -f operator/
    ## 查看pod运行情况,配置清单中镜像仓库地址为,quay.io,所以无需进行其它操作
    [root@k8s-master01 manifests]# kubectl get pods -n monitoring
    NAME                                   READY   STATUS    RESTARTS   AGE
    prometheus-operator-69bd579bf9-mjsxz   1/1     Running   0          20s

     2.3 部署metrics

    这里部署metrics之前,需要先确定集群中kube-apiserver是否已经开启聚合(支持集群接入第三方api)以及其它组件参数是否正确,否则会导致无法获取数据的情况出现,具体请参考前面文章K8S从入门到放弃系列-(13)Kubernetes集群mertics-server部署.

    ## 这里addon-resizer服务镜像使用的是google的仓库,我们修改为阿里云仓库
    [root@k8s-master01 manifests]# vim kube-state-metrics/kube-state-metrics-deployment.yaml
        #image: k8s.gcr.io/addon-resizer:1.8.4 ## 原有配置,注释修改为一下地址
           image: registry.aliyuncs.com/google_containers/addon-resizer:1.8.4
    [root@k8s-master01 kube-state-metrics]# kubectl apply -f kube-state-metrics/

    2.4 部署其它组件

    其它组件按照以上部署即可,镜像无需翻墙均可以正常下载。镜像下载速度取决与本地网络状况。

    [root@k8s-master01 kube-state-metrics]# kubectl apply -f adapter/
    [root@k8s-master01 kube-state-metrics]# kubectl apply -f alertmanager/
    [root@k8s-master01 kube-state-metrics]# kubectl apply -f node-exporter/
    [root@k8s-master01 kube-state-metrics]# kubectl apply -f grafana/
    [root@k8s-master01 kube-state-metrics]# kubectl apply -f prometheus/
    [root@k8s-master01 kube-state-metrics]# kubectl apply -f serviceMonitor/

    ## 部署完成后,可以查看下个资源运行部署详情
    [root@k8s-master01 manifests]# kubectl get all -n monitoring
    NAME                                       READY   STATUS    RESTARTS   AGE
    pod/grafana-558647b59-bhqmq                1/1     Running   0          94m
    pod/kube-state-metrics-79d4446fb5-5mj7d    4/4     Running   0          98m
    pod/node-exporter-4xq5t                    2/2     Running   0          111m
    pod/node-exporter-9b88m                    2/2     Running   0          111m
    pod/node-exporter-fdntx                    2/2     Running   0          111m
    pod/node-exporter-mwbxj                    2/2     Running   0          111m
    pod/node-exporter-tn7tl                    2/2     Running   0          111m
    pod/prometheus-adapter-57c497c557-vbgxd    1/1     Running   0          144m
    pod/prometheus-k8s-0                       3/3     Running   0          96m
    pod/prometheus-k8s-1                       3/3     Running   0          96m
    pod/prometheus-operator-69bd579bf9-mjsxz   1/1     Running   0          155m
    NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
    service/grafana               ClusterIP   10.254.42.208   <none>        3000/TCP            94m
    service/kube-state-metrics    ClusterIP   None            <none>        8443/TCP,9443/TCP   104m
    service/node-exporter         ClusterIP   None            <none>        9100/TCP            111m
    service/prometheus-adapter    ClusterIP   10.254.107.95   <none>        443/TCP             144m
    service/prometheus-k8s        ClusterIP   10.254.82.246   <none>        9090/TCP            96m
    service/prometheus-operated   ClusterIP   None            <none>        9090/TCP            96m
    service/prometheus-operator   ClusterIP   None            <none>        8080/TCP            156m
    NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
    daemonset.apps/node-exporter   5         5         5       5            5           beta.kubernetes.io/os=linux   111m
    NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/grafana               1/1     1            1           94m
    deployment.apps/kube-state-metrics    1/1     1            1           98m
    deployment.apps/prometheus-adapter    1/1     1            1           144m
    deployment.apps/prometheus-operator   1/1     1            1           156m
    NAME                                             DESIRED   CURRENT   READY   AGE
    replicaset.apps/grafana-558647b59                1         1         1       94m
    replicaset.apps/kube-state-metrics-5b86559fd5    0         0         0       98m
    replicaset.apps/kube-state-metrics-79d4446fb5    1         1         1       98m
    replicaset.apps/prometheus-adapter-57c497c557    1         1         1       144m
    replicaset.apps/prometheus-operator-69bd579bf9   1         1         1       156m
    NAME                              READY   AGE
    statefulset.apps/prometheus-k8s   2/2     96m

     3、创建ingress服务

    这里我没有采用NodePort的方式暴露服务,而是使用的Ingress,具体Ingress安装部署,查看前面文章K8S从入门到放弃系列-(15)Kubernetes集群Ingress部署

      3.1 编辑ingress配置文件

    ### 配置prometheus、grafana、alertmanager三个服务可视化web界面Ingress访问
    [root@k8s-master01 manifests]# cat ingress-all-svc.yml apiVersion: extensions/v1beta1 kind: Ingress metadata: name: prometheus-ing namespace: monitoring spec: rules: - host: prometheus.monitoring.k8s.local http: paths: - backend: serviceName: prometheus-k8s servicePort: 9090 --- apiVersion: extensions/v1beta1 kind: Ingress metadata: name: grafana-ing namespace: monitoring spec: rules: - host: grafana.monitoring.k8s.local http: paths: - backend: serviceName: grafana servicePort: 3000 --- apiVersion: extensions/v1beta1 kind: Ingress metadata: name: alertmanager-ing namespace: monitoring spec: rules: - host: alertmanager.monitoring.k8s.local http: paths: - backend: serviceName: alertmanager-main servicePort: 9093
    [root@k8s-master01 manifests]# kubectl apply -f ingress-all-svc.yml
    ## 可以看到三个服务对应域名已经创建
    [root@k8s-master01 manifests]# kubectl get ingress -n monitoring
    NAME               HOSTS                               ADDRESS   PORTS   AGE
    alertmanager-ing   alertmanager.monitoring.k8s.local             80      3d21h
    grafana-ing        grafana.monitoring.k8s.local                  80      3d21h
    prometheus-ing     prometheus.monitoring.k8s.local               80      3d21h
    ## 查看ingress暴露svc端口,后面访问需要加上端口号
    [root@k8s-master01 manifests]# kubectl get svc -n ingress-nginx
    NAME            TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
    ingress-nginx   NodePort   10.254.102.184   <none>        80:33848/TCP,443:45891/TCP   4d3h

     4、访问

    配置本地host解析,路径:C:WindowsSystem32driversetchosts

    172.16.11.123 prometheus.monitoring.k8s.local
    172.16.11.123 grafana.monitoring.k8s.local
    172.16.11.123 alertmanager.monitoring.k8s.local

     浏览器打开http://prometheus.monitoring.k8s.local:33848,可以看到已经监控的主机及pod

      

      4.2 问题

      这里部署好后,查看prometheus target界面,看到kube-controller-manager、kube-scheduler目标主机都为0

     原因分析:

      这是因为serviceMonitor选择svc时,是根据labels标签选取,而在指定的命名空间(kube-system),并没有对应的标签。kube-apiserver之所以正常是因为kube-apiserver 服务 namespace 在default使用默认svc kubernetes。其余组件服务在kube-system 空间 ,需要单独创建svc。

    ## 查看serviceMonitor选取svc规则
    [root@k8s-master01 manifests]# grep -2 selector serviceMonitor/prometheus-serviceMonitorKube* serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml- matchNames: serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml- - kube-system serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml: selector: serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml- matchLabels: serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml- k8s-app: kube-controller-manager -- serviceMonitor/prometheus-serviceMonitorKubelet.yaml- matchNames: serviceMonitor/prometheus-serviceMonitorKubelet.yaml- - kube-system serviceMonitor/prometheus-serviceMonitorKubelet.yaml: selector: serviceMonitor/prometheus-serviceMonitorKubelet.yaml- matchLabels: serviceMonitor/prometheus-serviceMonitorKubelet.yaml- k8s-app: kubelet -- serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml- matchNames: serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml- - kube-system serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml: selector: serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml- matchLabels: serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml- k8s-app: kube-scheduler
    ##查看kube-system命名空间下的svc,可以看到并没有kube-scheduler、kube-controller-manager
    [root@k8s-master01 manifests]# kubectl -n kube-system get svc
    NAME                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
    kube-dns                  ClusterIP   10.254.0.2   <none>        53/UDP,53/TCP,9153/TCP   49d
    kubelet                   ClusterIP   None         <none>        10250/TCP                4d2h

    但是却有对应的ep(没有带任何label)被创建,另外如果你的集群是kubeadm部署的就没有kubelet的ep,二进制部署的会有。

    [root@k8s-master01 manifests]# kubectl get ep -n kube-system
    NAME                      ENDPOINTS                                                                  AGE
    kube-controller-manager   <none>                                                                     6m38s
    kube-dns                  10.254.88.22:53,10.254.96.207:53,10.254.88.22:53 + 3 more...               49d
    kube-scheduler            <none>                                                                     6m39s
    kubelet                   172.16.11.120:10255,172.16.11.121:10255,172.16.11.122:10255 + 12 more...   4d3h

    解决:

     1)创建kube-controller-manager、kube-scheduler两个组件服务的集群svc,需要打上对应的标签,使其可以被servicemonitor选中。

    ## svc服务
    [root@k8s-master01 manifests]# cat controller-scheduler-svc.yml apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-controller-manager labels: k8s-app: kube-controller-manager spec: type: ClusterIP clusterIP: None ports: - name: http-metrics port: 10252 targetPort: 10252 protocol: TCP --- apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-scheduler labels: k8s-app: kube-scheduler spec: type: ClusterIP clusterIP: None ports: - name: http-metrics port: 10251 targetPort: 10251 protocol: TCP
    #####################注意##################################
    这里可以看到定义的svc并没有使用selector去过滤pod的标签,是因为
    kube-controller-manager、kube-scheduler属于非pod模式运行,所以
    无需使用selector过滤,但是需要手动创建endpoints与svc进行映射。
    官方文档解释:https://kubernetes.io/zh/docs/concepts/services-networking/service/
    ##########################################################
    ## ep修改,二进制部署需要我们把svc对应的ep的属性修改下,ip地址修改为自己的集群ip
    [root@k8s-master01 manifests]# cat controller-scheduler-ep.yml
    apiVersion: v1
    kind: Endpoints
    metadata:
      labels:
        k8s-app: kube-controller-manager
      name: kube-controller-manager
      namespace: kube-system
    subsets:
    - addresses:
      - ip: 172.16.11.120
      - ip: 172.16.11.121
      - ip: 172.16.11.122
      ports:
      - name: http-metrics
        port: 10252
        protocol: TCP
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
      labels:
        k8s-app: kube-scheduler
      name: kube-scheduler
      namespace: kube-system
    subsets:
    - addresses:
      - ip: 172.16.11.120
      - ip: 172.16.11.121
      - ip: 172.16.11.122
      ports:
      - name: http-metrics
        port: 10251
        protocol: TCP
    ## 查看创建资源
    [root@k8s-master01 manifests]# kubectl get ep,svc -n kube-system
    NAME                                ENDPOINTS                                                                  AGE
    endpoints/kube-controller-manager   172.16.11.120:10252,172.16.11.121:10252,172.16.11.122:10252                18m
    endpoints/kube-dns                  10.254.88.22:53,10.254.96.207:53,10.254.88.22:53 + 3 more...               49d
    endpoints/kube-scheduler            172.16.11.120:10251,172.16.11.121:10251,172.16.11.122:10251                18m
    endpoints/kubelet                   172.16.11.120:10255,172.16.11.121:10255,172.16.11.122:10255 + 12 more...   4d3h
    
    
    NAME                              TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
    service/kube-controller-manager   ClusterIP   None         <none>        10252/TCP                12m
    service/kube-dns                  ClusterIP   10.254.0.2   <none>        53/UDP,53/TCP,9153/TCP   49d
    service/kube-scheduler            ClusterIP   None         <none>        10251/TCP                12m
    service/kubelet                   ClusterIP   None         <none>        10250/TCP                4d3h
     

       2)修改kube-controller-manager、kube-scheduler监听地址使其能访问 metrics。 

    ## 修改为0.0.0.0
    --address=0.0.0.0

    修改完,重启服务,再次查看prometheus targets界面,可以看到都已经正常监听目标主机服务

    5、grafana监控查看

      1)可以看到grafana已经有多个dashboard页面

      2)可以正常监控

  • 相关阅读:
    如何用kaldi做孤立词识别三
    如何用kaldi做孤立词识别二
    脚本注释3
    [转] kaldi中FST的可视化-以yesno为例
    如何用kaldi做孤立词识别-初版
    [转]语言模型训练工具SRILM
    [转]kaldi 神经网络
    [转]kaldi ASR: DNN训练
    [转]Kaldi命令词识别
    [转] 如何用kaldi训练好的模型做特定任务的在线识别
  • 原文地址:https://www.cnblogs.com/tchua/p/11177045.html
Copyright © 2011-2022 走看看