zoukankan      html  css  js  c++  java
  • k8s全栈监控之metrics-server和prometheus

    一、概述

    • 使用metric-server收集数据给k8s集群内使用,如kubectl,hpa,scheduler等
    • 使用prometheus-operator部署prometheus,存储监控数据
    • 使用kube-state-metrics收集k8s集群内资源对象数据
    • 使用node_exporter收集集群中各节点的数据
    • 使用prometheus收集apiserver,scheduler,controller-manager,kubelet组件数据
    • 使用alertmanager实现监控报警
    • 使用grafana实现数据可视化

    1、部署metrics-server

     

    git  clone  https://github.com/cuishuaigit/k8s-monitor.git
    
    cd  k8s-monitor  

     

    我都是把这种服务部署在master节点上面,此时需要修改metrics-server-deployment.yaml

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: metrics-server
      namespace: kube-system
    ---
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: metrics-server
      namespace: kube-system
      labels:
        k8s-app: metrics-server
    spec:
      selector:
        matchLabels:
          k8s-app: metrics-server
      template:
        metadata:
          name: metrics-server
          labels:
            k8s-app: metrics-server
        spec:
          serviceAccountName: metrics-server
          tolerations:
            - effect: NoSchedule
              key: node.kubernetes.io/unschedulable
              operator: Exists
            - key: NoSchedule
              operator: Exists
              effect: NoSchedule
          volumes:
          # mount in tmp so we can safely use from-scratch images and/or read-only containers
          - name: tmp-dir
            emptyDir: {}
          containers:
          - name: metrics-server
            image: k8s.gcr.io/metrics-server-amd64:v0.3.1
            imagePullPolicy: Always
            command:
            - /metrics-server
            - --kubelet-insecure-tls
            - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
            volumeMounts:
            - name: tmp-dir
              mountPath: /tmp
          nodeSelector:
            metrics: "yes"

    为master节点添加label

    kubectl label nodes ku  metrics=yes

    部署

    kubectl create -f metrics-server/deploy/1.8+/

     

    验证:

    it's cool

    注:metrics-server默认使用node的主机名,但是coredns里面没有物理机主机名的解析,一种是部署的时候添加一个参数:

    - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

    第二种是使用dnsmasq构建一个上游的dns服务,参照https://www.cnblogs.com/cuishuai/p/9856843.html。

     

     

    2、部署prometheus

    下载相关文件:

    前面部署metrics-server已经把所有的文件pull到本地了,所以直接使用
    cd k8s-monitor
    
    

    1.搭建nfs服务动态提供持久化存储

    1.安装nfs 
    sudo apt-get install -y nfs-kernel-server
    sudo apt-get install -y nfs-common 
    sudo vi /etc/exports 
    /data/opv *(rw,sync,no_root_squash,no_subtree_check)
    注意将*换成自己的ip段,纯内网的话也可以用*,代替任意
    sudo /etc/init.d/rpcbind restart 
    sudo /etc/init.d/nfs-kernel-server restart 
    sudo systemctl enable rpcbind nfs-kernel-server 
    
    客户端挂在使用
    sudo apt-get install -y nfs-common
    mount -t nfs ku13-1:/data/opv  /data/opv -o proto=tcp -o nolock
    为了方便使用将上面的mount命令直接放到.bashrc里面 
    2.创建namesapce
    kubectl creaet -f nfs/monitoring-namepsace.yaml 
    3.为nfs创建rbac 
    kubectl create -f nfs/rbac.yaml 
    4.创建deployment,将nfs的地址换成自己的
    kubectl create -f nfs/nfs-deployment.yaml 
    5.创建storageclass
    kubectl create -f nfs/storageClass.yaml 
    

    2.安装Prometheus

    cd k8s-monitor/Promutheus/prometheus

    1.创建权限
    kubectl create -f rbac.yaml
    2.创建 node-exporter
    kubectl create -f prometheus-node-exporter-daemonset.yaml
    kubectl create -f prometheus-node-exporter-service.yaml
    3.创建 kube-state-metrics
    kubectl create -f kube-state-metrics-deployment.yaml
    kubectl create -f kube-state-metrics-service.yaml
    4.创建 node-directory-size-metrics
    kubectl create -f node-directory-size-metrics-daemonset.yaml
    5.创建 prometheus
    kubectl create -f prometheus-pvc.yaml
    kubectl create -f prometheus-core-configmap.yaml
    kubectl create -f prometheus-core-deployment.yaml
    kubectl create -f prometheus-core-service.yaml
    kubectl create -f prometheus-rules-configmap.yaml
    6.修改core-configmap里的etcd地址
    

    3.安装Grafana

    cd k8s-monitor/Promutheus/grafana

    1.安装grafana service
    kubectl create -f grafana-svc.yaml
    2.创建configmap
    kubectl create -f grafana-configmap.yaml
    3.创建pvc
    kubectl create -f grafana-pvc.yaml
    4.创建gragana deployment
    kubectl create -f grafana-deployment.yaml
    5.创建dashboard configmap
    kubectl create configmap "grafana-import-dashboards" --from-file=dashboards/ --namespace=monitoring
    6.创建job,导入dashboard等数据
     kubectl create -f grafana-job.yaml
     

     

    查看部署:

     

    prometheus和grafana都是采用的nodePort方式暴漏的服务,所以可以直接访问。

    grafana默认的用户名密码:admin/admin

     

     

    QA:

    1、集群是使用kubeadm部署的,controller-manager和schedule都是监听的127.0.0.1,导致prometheus收集不到相关的数据?

    可以在初始化之前修改其监听地址:

    apiVersion: kubeadm.k8s.io/v1beta1
    kind: ClusterConfiguration
    controllerManager:
      extraArgs:
        address: 0.0.0.0
    scheduler:
      extraArgs:
        address: 0.0.0.0

    如果集群已经构建好了:

    sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-controller-manager.yaml
    sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-scheduler.yaml

     

    2、metrics-server不能使用,报错不能解析node节点的主机名?

    需要修改deployment文件,

    - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

    3、metrics-server报错,x509,证书是非信任的?

     

     

     command:
            - /metrics-server
            - --kubelet-insecure-tls

     

    4、完整的配置文件

    containers:
          - name: metrics-server
            image: k8s.gcr.io/metrics-server-amd64:v0.3.1
            command:
            - /metrics-server
            - --metric-resolution=30s
            - --kubelet-insecure-tls
            - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

     

  • 相关阅读:
    Intent
    What should we do next in general after collecting relevant data
    NOTE FOR Secure Friend Discovery in Mobile Social Networks
    missing pcap.h
    after building Android Source code
    plot point(one column)
    When talking to someone else, don't infer that is has been talked with others at first. It may bring repulsion to the person who is talking with you.
    进程基本知识
    Python input和raw_input的区别
    强制 code review:reviewboard+svn 的方案
  • 原文地址:https://www.cnblogs.com/cuishuai/p/9857120.html
Copyright © 2011-2022 走看看