zoukankan      html  css  js  c++  java
  • k8s系列---资源指标API及自定义指标API

     https://www.linuxea.com/2112.html

    以前是用heapster来收集资源指标才能看,现在heapster要废弃了。

        从k8s v1.8开始后,引入了新的功能,即把资源指标引入api。 

        资源指标:metrics-server 

        自定义指标: prometheus,k8s-prometheus-adapter 

        因此,新一代架构: 

        1) 核心指标流水线:由kubelet、metrics-server以及由API server提供的api组成;cpu累计利用率、内存实时利用率、pod的资源占用率及容器的磁盘占用率 

        2) 监控流水线:用于从系统收集各种指标数据并提供终端用户、存储系统以及HPA,他们包含核心指标以及许多非核心指标。非核心指标不能被k8s所解析。 

        metrics-server是个api server,仅仅收集cpu利用率、内存利用率等。

    [root@master ~]# kubectl api-versions
    admissionregistration.k8s.io/v1beta1
    apiextensions.k8s.io/v1beta1
    apiregistration.k8s.io/v1
    apiregistration.k8s.io/v1beta1
    apps/v1
    apps/v1beta1
    apps/v1beta2
    authentication.k8s.io/v1
    authentication.k8s.io/v1beta1
    authorization.k8s.io/v1
    

      

     访问 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server  获取yaml文件,但这个里面的yaml文件更新了。和视频内的有差别

    贴出我修改后的yaml文件,留作备用

    [root@master metrics-server]# cat auth-delegator.yaml 
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: metrics-server:system:auth-delegator
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:auth-delegator
    subjects:
    - kind: ServiceAccount
      name: metrics-server
      namespace: kube-system
    cat auth-delegator.yaml
    [root@master metrics-server]# cat auth-reader.yaml 
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: metrics-server-auth-reader
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: extension-apiserver-authentication-reader
    subjects:
    - kind: ServiceAccount
      name: metrics-server
      namespace: kube-system
    auth-reader.yaml
    [root@master metrics-server]# cat metrics-apiservice.yaml 
    apiVersion: apiregistration.k8s.io/v1beta1
    kind: APIService
    metadata:
      name: v1beta1.metrics.k8s.io
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    spec:
      service:
        name: metrics-server
        namespace: kube-system
      group: metrics.k8s.io
      version: v1beta1
      insecureSkipTLSVerify: true
      groupPriorityMinimum: 100
      versionPriority: 100
    metrics-apiservice.yaml

    关键是这个文件

    [root@master metrics-server]# cat metrics-server-deployment.yaml
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: metrics-server
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: metrics-server-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      NannyConfiguration: |-
        apiVersion: nannyconfig/v1alpha1
        kind: NannyConfiguration
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: metrics-server-v0.3.1
      namespace: kube-system
      labels:
        k8s-app: metrics-server
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v0.3.1
    spec:
      selector:
        matchLabels:
          k8s-app: metrics-server
          version: v0.3.1
      template:
        metadata:
          name: metrics-server
          labels:
            k8s-app: metrics-server
            version: v0.3.1
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
            seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
        spec:
          priorityClassName: system-cluster-critical
          serviceAccountName: metrics-server
          containers:
          - name: metrics-server
            image: mirrorgooglecontainers/metrics-server-amd64:v0.3.1
            command:
            - /metrics-server
            - --metric-resolution=30s
            - --kubelet-insecure-tls
            - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
            # These are needed for GKE, which doesn't support secure communication yet.
            # Remove these lines for non-GKE clusters, and when GKE supports token-based auth.
            #- --kubelet-port=10250
            #- --deprecated-kubelet-completely-insecure=true
    
            ports:
            - containerPort: 443
              name: https
              protocol: TCP
          - name: metrics-server-nanny
            image: mirrorgooglecontainers/addon-resizer:1.8.4
            resources:
              limits:
                cpu: 100m
                memory: 300Mi
              requests:
                cpu: 5m
                memory: 50Mi
            env:
              - name: MY_POD_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: MY_POD_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
            volumeMounts:
            - name: metrics-server-config-volume
              mountPath: /etc/config
            command:
              - /pod_nanny
              - --config-dir=/etc/config
              - --cpu=100m
              - --extra-cpu=0.5m
              - --memory=100Mi
              - --extra-memory=50Mi
              - --threshold=5
              - --deployment=metrics-server-v0.3.1
              - --container=metrics-server
              - --poll-period=300000
              - --estimator=exponential
              # Specifies the smallest cluster (defined in number of nodes)
              #           # resources will be scaled to.
              - --minClusterSize=10
    
          volumes:
            - name: metrics-server-config-volume
              configMap:
                name: metrics-server-config
          tolerations:
            - key: "CriticalAddonsOnly"
              operator: "Exists"
    metrics-server-deployment.yaml
    [root@master metrics-server]# cat metrics-server-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: metrics-server
      namespace: kube-system
      labels:
        addonmanager.kubernetes.io/mode: Reconcile
        kubernetes.io/cluster-service: "true"
        kubernetes.io/name: "Metrics-server"
    spec:
      selector:
        k8s-app: metrics-server
      ports:
      - port: 443
        protocol: TCP
        targetPort: https
    metrics-server-service.yaml
    [root@master metrics-server]# cat metrics-server-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: metrics-server
      namespace: kube-system
      labels:
        addonmanager.kubernetes.io/mode: Reconcile
        kubernetes.io/cluster-service: "true"
        kubernetes.io/name: "Metrics-server"
    spec:
      selector:
        k8s-app: metrics-server
      ports:
      - port: 443
        protocol: TCP
        targetPort: https
    [root@master metrics-server]# cat resource-reader.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: system:metrics-server
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    rules:
    - apiGroups:
      - ""
      resources:
      - pods
      - nodes
      - namespaces
      - nodes/stats
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - "extensions"
      resources:
      - deployments
      verbs:
      - get
      - list
      - update
      - watch
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: system:metrics-server
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:metrics-server
    subjects:
    - kind: ServiceAccount
      name: metrics-server
      namespace: kube-system
    metrics-server-service.yaml

    如果从github上下载以上文件apply出错,就用上面的metrics-server-deployment.yaml文件,删掉重新apply一下就可以了

    [root@master metrics-server]# kubectl apply -f ./
    

      

    [root@master ~]#  kubectl proxy --port=8080
    

      

    确保metrics-server-v0.3.1-76b796b-4xgvp是running状态,我当时出现了Error发现是yaml里面有问题,最后该掉running了,该来该去该到上面的最终版

    [root@master metrics-server]# kubectl get pods -n kube-system
    NAME                                    READY   STATUS    RESTARTS   AGE
    canal-mgbc2                             3/3     Running   12         3d23h
    canal-s4xgb                             3/3     Running   23         3d23h
    canal-z98bc                             3/3     Running   15         3d23h
    coredns-78d4cf999f-5shdq                1/1     Running   0          6m4s
    coredns-78d4cf999f-xj5pj                1/1     Running   0          5m53s
    etcd-master                             1/1     Running   13         17d
    kube-apiserver-master                   1/1     Running   13         17d
    kube-controller-manager-master          1/1     Running   19         17d
    kube-flannel-ds-amd64-8xkfn             1/1     Running   0          <invalid>
    kube-flannel-ds-amd64-t7jpc             1/1     Running   0          <invalid>
    kube-flannel-ds-amd64-vlbjz             1/1     Running   0          <invalid>
    kube-proxy-ggcbf                        1/1     Running   11         17d
    kube-proxy-jxksd                        1/1     Running   11         17d
    kube-proxy-nkkpc                        1/1     Running   12         17d
    kube-scheduler-master                   1/1     Running   19         17d
    kubernetes-dashboard-76479d66bb-zr4dd   1/1     Running   0          <invalid>
    metrics-server-v0.3.1-76b796b-4xgvp     2/2     Running   0          9s
    

      

    查看出错日志 -c指定容器名,该pod内有两个容器,metrcis-server只是其中一个,另一个查询方法一样,把名字改掉即可

    [root@master metrics-server]# kubectl logs metrics-server-v0.3.1-76b796b-4xgvp   -c metrics-server -n kube-system
    

      

    大致出错的日志内容如下几条;

    403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)
    
    E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host
    
    
    no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )
    
    
    E1109 09:54:49.509521       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:linuxea.node-2.com: unable to fetch metrics from Kubelet linuxea.node-2.com (10.10.240.203): Get https://10.10.240.203:10255/stats/summary/: dial tcp 10.10.240.203:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-3.com: unable to fetch metrics from Kubelet linuxea.node-3.com (10.10.240.143): Get https://10.10.240.143:10255/stats/summary/: dial tcp 10.10.240.143:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-4.com: unable to fetch metrics from Kubelet linuxea.node-4.com (10.10.240.142): Get https://10.10.240.142:10255/stats/summary/: dial tcp 10.10.240.142:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.master-1.com: unable to fetch metrics from Kubelet linuxea.master-1.com (10.10.240.161): Get https://10.10.240.161:10255/stats/summary/: dial tcp 10.10.240.161:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-1.com: unable to fetch metrics from Kubelet linuxea.node-1.com (10.10.240.202): Get https://10.10.240.202:10255/stats/summary/: dial tcp 10.10.240.202:10255: connect: connection refused]
    

      

    当时我按照网上的方法尝试修改coredns配置,结果搞的日志出现获取所有pod都unable,如下,然后又取消掉了修改,删掉了coredns,让他自己重新生成了俩新的coredns容器

    - --kubelet-insecure-tls这种方式是禁用tls验证,一般不建议在生产环境中使用。并且由于DNS是无法解析到这些主机名,使用- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP进行规避。还有另外一种方法,修改coredns,不过,我并不建议这样做。

    参考这篇:https://github.com/kubernetes-incubator/metrics-server/issues/131

    metrics-server unable to fetch pdo metrics for pod
    

      

    以上为遇到的问题,反正用我上面的yaml绝对保证解决以上所有问题。还有那个flannel改了directrouting之后为啥每次重启集群机器,他就失效呢,我不得不在删掉flannel然后重新生成,这个问题前面文章写到了。

    此时执行如下命令就都成功了,item里也有值了

    [root@master ~]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1
    {
      "kind": "APIResourceList",
      "apiVersion": "v1",
      "groupVersion": "metrics.k8s.io/v1beta1",
      "resources": [
        {
          "name": "nodes",
          "singularName": "",
          "namespaced": false,
          "kind": "NodeMetrics",
          "verbs": [
            "get",
            "list"
          ]
        },
        {
          "name": "pods",
          "singularName": "",
          "namespaced": true,
          "kind": "PodMetrics",
          "verbs": [
            "get",
            "list"
          ]
        }
      ]
    

      

    [root@master metrics-server]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods | more
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 14868    0 14868    0     0  1521k      0 --:--:-- --:--:-- --:--:-- 1613k
    {
      "kind": "PodMetricsList",
      "apiVersion": "metrics.k8s.io/v1beta1",
      "metadata": {
        "selfLink": "/apis/metrics.k8s.io/v1beta1/pods"
      },
      "items": [
        {
          "metadata": {
            "name": "pod1",
            "namespace": "prod",
            "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/prod/pods/pod1",
            "creationTimestamp": "2019-01-29T02:39:12Z"
          },
    

      

    [root@master metrics-server]# kubectl top pods
    NAME                CPU(cores)   MEMORY(bytes)   
    filebeat-ds-4llpp   1m           2Mi             
    filebeat-ds-dv49l   1m           5Mi             
    myapp-0             0m           1Mi             
    myapp-1             0m           2Mi             
    myapp-2             0m           1Mi             
    myapp-3             0m           1Mi             
    myapp-4             0m           2Mi    
    

      

    [root@master metrics-server]# kubectl top nodes
    NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
    master   206m         5%     1377Mi          72%       
    node1    88m          8%     534Mi           28%       
    node2    78m          7%     935Mi           49% 
    

      

    自定义指标(prometheus)

        大家看到,我们的metrics已经可以正常工作了。不过,metrics只能监控cpu和内存,对于其他指标如用户自定义的监控指标,metrics就无法监控到了。这时就需要另外一个组件叫prometheus。

        prometheus的部署非常麻烦。

        node_exporter是agent;

        PromQL相当于sql语句来查询数据; 

        k8s-prometheus-adapter:prometheus是不能直接解析k8s的指标的,需要借助k8s-prometheus-adapter转换成api

        kube-state-metrics是用来整合数据的。

        下面开始部署。

        访问 https://github.com/ikubernetes/k8s-prom

    [root@master pro]# git clone https://github.com/iKubernetes/k8s-prom.git
    

      

    先创建一个叫prom的名称空间: 

    [root@master k8s-prom]# kubectl apply -f namespace.yaml 
    namespace/prom created
    

      

     部署node_exporter: 

    [root@master k8s-prom]# cd node_exporter/
    [root@master node_exporter]# ls
    node-exporter-ds.yaml  node-exporter-svc.yaml
    [root@master node_exporter]# kubectl apply -f .
    daemonset.apps/prometheus-node-exporter created
    service/prometheus-node-exporter created
    

      

    [root@master node_exporter]# kubectl get pods -n prom
    NAME                             READY     STATUS    RESTARTS   AGE
    prometheus-node-exporter-dmmjj   1/1       Running   0          7m
    prometheus-node-exporter-ghz2l   1/1       Running   0          7m
    prometheus-node-exporter-zt2lw   1/1       Running   0          7m
    

      

        部署prometheus: 

    [root@master k8s-prom]# cd prometheus/
    [root@master prometheus]# ls
    prometheus-cfg.yaml  prometheus-deploy.yaml  prometheus-rbac.yaml  prometheus-svc.yaml
    [root@master prometheus]# kubectl apply -f .
    configmap/prometheus-config created
    deployment.apps/prometheus-server created
    clusterrole.rbac.authorization.k8s.io/prometheus created
    serviceaccount/prometheus created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus created
    service/prometheus created
    

      

    看prom名称空间中的所有资源: pod/prometheus-server-76dc8df7b-hw8xc  处于 Pending   状态,日志显示内存不足

     [root@master prometheus]# kubectl logs prometheus-server-556b8896d6-dfqkp -n prom  
    Warning  FailedScheduling  2m52s (x2 over 2m52s)  default-scheduler  0/3 nodes are available: 3 Insufficient memory.
    

      

    修改prometheus-deploy.yaml,删掉内存那三行

            resources:
              limits:
                memory: 2Gi
    

      

    重新apply

    [root@master prometheus]# kubectl apply -f prometheus-deploy.yaml
    

      

    [root@master prometheus]# kubectl get all -n prom
    NAME                                     READY     STATUS    RESTARTS   AGE
    pod/prometheus-node-exporter-dmmjj       1/1       Running   0          10m
    pod/prometheus-node-exporter-ghz2l       1/1       Running   0          10m
    pod/prometheus-node-exporter-zt2lw       1/1       Running   0          10m
    pod/prometheus-server-65f5d59585-6l8m8   1/1       Running   0          55s
    NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/prometheus                 NodePort    10.111.127.64   <none>        9090:30090/TCP   56s
    service/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         10m
    NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    daemonset.apps/prometheus-node-exporter   3         3         3         3            3           <none>          10m
    NAME                                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/prometheus-server   1         1         1            1           56s
    NAME                                           DESIRED   CURRENT   READY     AGE
    replicaset.apps/prometheus-server-65f5d59585   1         1         1         56s
    

      

    上面我们看到通过NodePorts的方式,可以通过宿主机的30090端口,来访问prometheus容器里面的应用。 

        最好挂载个pvc的存储,要不这些监控数据过一会就没了。 

        部署kube-state-metrics,用来整合数据:  

    [root@master k8s-prom]# cd kube-state-metrics/
    [root@master kube-state-metrics]# ls
    kube-state-metrics-deploy.yaml  kube-state-metrics-rbac.yaml  kube-state-metrics-svc.yaml
    [root@master kube-state-metrics]# kubectl apply -f .
    deployment.apps/kube-state-metrics created
    serviceaccount/kube-state-metrics created
    clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
    clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    service/kube-state-metrics created
    

      

    [root@master kube-state-metrics]# kubectl get all -n prom
    NAME                                      READY     STATUS    RESTARTS   AGE
    pod/kube-state-metrics-58dffdf67d-v9klh   1/1       Running   0          14m
    NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/kube-state-metrics         ClusterIP   10.111.41.139   <none>        8080/TCP         14m
    

      

    部署k8s-prometheus-adapter,这个需要自制证书:

    [root@master k8s-prometheus-adapter]# cd /etc/kubernetes/pki/
    [root@master pki]# (umask 077; openssl genrsa -out serving.key 2048)
    Generating RSA private key, 2048 bit long modulus
    ...........................................................................................+++
    ...............+++
    e is 65537 (0x10001)
    

      

        证书请求: 

    [root@master pki]#  openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
    

      

        开始签证: 

    [root@master pki]# openssl  x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650
    Signature ok
    subject=/CN=serving
    Getting CA Private Key
    

      

        创建加密的配置文件: 

    [root@master pki]# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key  -n prom
    secret/cm-adapter-serving-certs created
    

      

        注:cm-adapter-serving-certs是custom-metrics-apiserver-deployment.yaml文件里面的名字。

    [root@master pki]# kubectl get secrets -n prom
    NAME                             TYPE                                  DATA      AGE
    cm-adapter-serving-certs         Opaque                                2         51s
    default-token-knsbg              kubernetes.io/service-account-token   3         4h
    kube-state-metrics-token-sccdf   kubernetes.io/service-account-token   3         3h
    prometheus-token-nqzbz           kubernetes.io/service-account-token   3         3h
    

      

      部署k8s-prometheus-adapter:

    [root@master k8s-prom]# cd k8s-prometheus-adapter/
    [root@master k8s-prometheus-adapter]# ls
    custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml   custom-metrics-apiserver-service.yaml
    custom-metrics-apiserver-auth-reader-role-binding.yaml              custom-metrics-apiservice.yaml
    custom-metrics-apiserver-deployment.yaml                            custom-metrics-cluster-role.yaml
    custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml  custom-metrics-resource-reader-cluster-role.yaml
    custom-metrics-apiserver-service-account.yaml                       hpa-custom-metrics-cluster-role-binding.yaml
    

      

     由于k8s v1.11.2和k8s-prometheus-adapter最新版不兼容,1.13的也不兼容,解决办法就是访问https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests下载最新版的custom-metrics-apiserver-deployment.yaml文件,并把里面的namespace的名字改成prom;同时还要下载custom-metrics-config-map.yaml文件到本地来,并把里面的namespace的名字改成prom。

    [root@master k8s-prometheus-adapter]# kubectl apply -f .
    clusterrolebinding.rbac.authorization.k8s.io/custom-metrics:system:auth-delegator created
    rolebinding.rbac.authorization.k8s.io/custom-metrics-auth-reader created
    deployment.apps/custom-metrics-apiserver created
    clusterrolebinding.rbac.authorization.k8s.io/custom-metrics-resource-reader created
    serviceaccount/custom-metrics-apiserver created
    service/custom-metrics-apiserver created
    apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created
    clusterrole.rbac.authorization.k8s.io/custom-metrics-server-resources created
    clusterrole.rbac.authorization.k8s.io/custom-metrics-resource-reader created
    clusterrolebinding.rbac.authorization.k8s.io/hpa-controller-custom-metrics created
    

      

    [root@master k8s-prometheus-adapter]# kubectl get all -n prom
    NAME                                           READY     STATUS    RESTARTS   AGE
    pod/custom-metrics-apiserver-65f545496-64lsz   1/1       Running   0          6m
    pod/kube-state-metrics-58dffdf67d-v9klh        1/1       Running   0          4h
    pod/prometheus-node-exporter-dmmjj             1/1       Running   0          4h
    pod/prometheus-node-exporter-ghz2l             1/1       Running   0          4h
    pod/prometheus-node-exporter-zt2lw             1/1       Running   0          4h
    pod/prometheus-server-65f5d59585-6l8m8         1/1       Running   0          4h
    NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/custom-metrics-apiserver   ClusterIP   10.103.87.246   <none>        443/TCP          36m
    service/kube-state-metrics         ClusterIP   10.111.41.139   <none>        8080/TCP         4h
    service/prometheus                 NodePort    10.111.127.64   <none>        9090:30090/TCP   4h
    service/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         4h
    NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    daemonset.apps/prometheus-node-exporter   3         3         3         3            3           <none>          4h
    NAME                                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/custom-metrics-apiserver   1         1         1            1           36m
    deployment.apps/kube-state-metrics         1         1         1            1           4h
    deployment.apps/prometheus-server          1         1         1            1           4h
    NAME                                                  DESIRED   CURRENT   READY     AGE
    replicaset.apps/custom-metrics-apiserver-5f6b4d857d   0         0         0         36m
    replicaset.apps/custom-metrics-apiserver-65f545496    1         1         1         6m
    replicaset.apps/custom-metrics-apiserver-86ccf774d5   0         0         0         17m
    replicaset.apps/kube-state-metrics-58dffdf67d         1         1         1         4h
    replicaset.apps/prometheus-server-65f5d59585          1         1         1         4h
    

      

      最终看到prom名称空间里面的所有资源都是running状态了。 

    [root@master k8s-prometheus-adapter]# kubectl api-versions
    custom.metrics.k8s.io/v1beta1
    

      

      可以看到custom.metrics.k8s.io/v1beta1这个api了。我那没看到上面这个东西,但是不影响使用

      开个代理: 

    [root@master k8s-prometheus-adapter]# kubectl proxy --port=8080
    

      

         可以看到指标数据了:

    [root@master pki]# curl  http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
     {
          "name": "pods/ceph_rocksdb_submit_transaction_sync",
          "singularName": "",
          "namespaced": true,
          "kind": "MetricValueList",
          "verbs": [
            "get"
          ]
        },
        {
          "name": "jobs.batch/kube_deployment_created",
          "singularName": "",
          "namespaced": true,
          "kind": "MetricValueList",
          "verbs": [
            "get"
          ]
        },
        {
          "name": "jobs.batch/kube_pod_owner",
          "singularName": "",
          "namespaced": true,
          "kind": "MetricValueList",
          "verbs": [
            "get"
          ]
        },
    

      

      下面我们就可以愉快的创建HPA了(水平Pod自动伸缩)。

        另外,prometheus还可以和grafana整合。如下步骤。

        先下载文件grafana.yaml,访问https://github.com/kubernetes/heapster/blob/master/deploy/kube-config/influxdb/grafana.yaml

    [root@master pro]# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml
    

      

        修改grafana.yaml文件内容:

    把namespace: kube-system改成prom,有两处;
     把env里面的下面两个注释掉:
            - name: INFLUXDB_HOST
              value: monitoring-influxdb
     在最有一行加个type: NodePort
     ports:
      - port: 80
        targetPort: 3000
      selector:
        k8s-app: grafana
      type: NodePort
    

      

    [root@master pro]# kubectl apply -f grafana.yaml 
    deployment.extensions/monitoring-grafana created
    service/monitoring-grafana created
    

      

    [root@master pro]# kubectl get pods -n prom
    NAME                                       READY     STATUS    RESTARTS   AGE
    monitoring-grafana-ffb4d59bd-gdbsk         1/1       Running   0          5s
    

      

    如果还有问题就删掉上面的那几个,重新在apply一下

        看到grafana这个pod运行起来了。 

    [root@master pro]# kubectl get svc -n prom
    NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    monitoring-grafana         NodePort    10.106.164.205   <none>        80:32659/TCP     19m
    

      

     我们可以访问宿主机master ip: http://172.16.1.100:32659

     

    上图端口号是9090,根据自己svc实际端口去填写。除了把80 改成9090.其余不变,为什么是上面的格式,因为他们都处于一个名称空间内,可以通过服务名访问到的。

    [root@master pro]# kubectl get svc -n prom     
    NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    custom-metrics-apiserver   ClusterIP   10.109.58.249   <none>        443/TCP          52m
    kube-state-metrics         ClusterIP   10.103.52.45    <none>        8080/TCP         69m
    monitoring-grafana         NodePort    10.110.240.31   <none>        80:31128/TCP     17m
    prometheus                 NodePort    10.110.19.171   <none>        9090:30090/TCP   145m
    prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         146m
    

      

        然后,就能从界面上看到相应的数据了。 

        登录下面的网站下载个grafana监控k8s-prometheus的模板: https://grafana.com/dashboards/6417

        然后再grafana的界面中导入上面下载的模板: 

        导入模板之后,就能看到监控数据了: 

     HPA的没去实际操作,因为以前自己做过了,就不做了,直接复制过来,如有问题自己单独解决

    HPA(水平pod自动扩展) 

        当pod压力大了,会根据负载自动扩展Pod个数以均匀压力。 

        目前,HPA只支持两个版本,v1版本只支持核心指标的定义(只能根据cpu利用率的指标进行pod的扩展); 

    [root@master pro]# kubectl explain hpa.spec.scaleTargetRef
    scaleTargetRef:表示基于什么指标来计算pod伸缩的标准
    

      

    [root@master pro]# kubectl api-versions |grep auto
    autoscaling/v1
    autoscaling/v2beta1
    

      

        上面看到分别支持hpav1和hpav2。 

        下面我们用命令行的方式重新创建一个带有资源限制的pod myapp: 

    [root@master ~]# kubectl run myapp --image=ikubernetes/myapp:v1 --replicas=1 --requests='cpu=50m,memory=256Mi' --limits='cpu=50m,memory=256Mi' --labels='app=myapp' --expose --port=80
    service/myapp created
    deployment.apps/myapp created
    

      

    [root@master ~]# kubectl get pods
    NAME                     READY     STATUS    RESTARTS   AGE
    myapp-6985749785-fcvwn   1/1       Running   0          58s
    

      

        下面我们让myapp 这个pod能自动水平扩展,用kubectl autoscale,其实就是指明HPA控制器的。 

    [root@master ~]# kubectl autoscale deployment myapp --min=1 --max=8 --cpu-percent=60
    horizontalpodautoscaler.autoscaling/myapp autoscaled
    

      

     --min:表示最小扩展pod的个数 

        --max:表示最多扩展pod的个数 

        --cpu-percent:cpu利用率 

    [root@master ~]# kubectl get hpa
    NAME      REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
    myapp     Deployment/myapp   0%/60%    1         8         1          4m
    

      

    [root@master ~]# kubectl get svc
    NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
    myapp        ClusterIP   10.105.235.197   <none>        80/TCP              19
    

      

        下面我们把service改成NodePort的方式:

    [root@master ~]# kubectl patch svc myapp -p '{"spec":{"type": "NodePort"}}'
    service/myapp patched
    

      

    [root@master ~]# kubectl get svc
    NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
    myapp        NodePort    10.105.235.197   <none>        80:31990/TCP        22m
    

      

    [root@master ~]# yum install httpd-tools #主要是为了安装ab压测工具
    

      

    [root@master ~]# kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP            NODE
    myapp-6985749785-fcvwn   1/1       Running   0          25m       10.244.2.84   node2
    

      

        开始用ab工具压测 

    [root@master ~]# ab -c 1000 -n 5000000 http://172.16.1.100:31990/index.html
    This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/
    Benchmarking 172.16.1.100 (be patient)
    

      

        多等一会,会看到pods的cpu利用率为98%,需要扩展为2个pod了: 

    [root@master ~]# kubectl describe hpa
    resource cpu on pods  (as a percentage of request):  98% (49m) / 60%
    Deployment pods:                                       1 current / 2 desired
    

      

    [root@master ~]# kubectl top pods
    NAME                     CPU(cores)   MEMORY(bytes)   
    myapp-6985749785-fcvwn   49m (我们设置的总cpu是50m)         3Mi
    

      

    [root@master ~]#  kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
    myapp-6985749785-fcvwn   1/1       Running   0          32m       10.244.2.84    node2
    myapp-6985749785-sr4qv   1/1       Running   0          2m        10.244.1.105   node1
    

      

        上面我们看到已经自动扩展为2个pod了,再等一会,随着cpu压力的上升,还会看到自动扩展为4个或更多的pod: 

    [root@master ~]#  kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
    myapp-6985749785-2mjrd   1/1       Running   0          1m        10.244.1.107   node1
    myapp-6985749785-bgz6p   1/1       Running   0          1m        10.244.1.108   node1
    myapp-6985749785-fcvwn   1/1       Running   0          35m       10.244.2.84    node2
    myapp-6985749785-sr4qv   1/1       Running   0          5m        10.244.1.105   node1
    

      

        等压测一停止,pod个数还会收缩为正常个数的。

        上面我们用的是hpav1来做的水平pod自动扩展的功能,我们前面也说过,hpa v1版本只能根据cpu利用率括水平自动扩展pod。 

        下面我们介绍一下hpa v2的功能,它可以根据自定义指标利用率来水平扩展pod。 

        在使用hpa v2版本前,我们先把前面创建的hpa v1版本删除了,以免和我们测试的hpa v2版本冲突: 

    [root@master hpa]# kubectl delete hpa myapp
    horizontalpodautoscaler.autoscaling "myapp" deleted
    

      

    好了,下面我们创建一个hpa v2: 

    [root@master hpa]# cat hpa-v2-demo.yaml 
    apiVersion: autoscaling/v2beta1   #从这可以看出是hpa v2版本
    kind: HorizontalPodAutoscaler
    metadata:
      name: myapp-hpa-v2
    spec:
      scaleTargetRef: #根据什么指标来做评估压力
        apiVersion: apps/v1 #对谁来做自动扩展
        kind: Deployment
        name: myapp
      minReplicas: 1 #最少副本数量
      maxReplicas: 10
      metrics: #表示依据哪些指标来进行评估
      - type: Resource #表示基于资源进行评估
        resource: 
          name: cpu
          targetAverageUtilization: 55 #表示pod cpu使用率超过55%,就自动水平扩展pod个数
      - type: Resource
        resource:
          name: memory #我们知道hpa v1版本只能根据cpu来进行评估,而到了我们的hpa v2版本就可以根据内存来进行评估了
          targetAverageValue: 50Mi #表示pod内存使用超过50M,就自动水平扩展pod个数
    

      

    [root@master hpa]# kubectl apply -f hpa-v2-demo.yaml 
    horizontalpodautoscaler.autoscaling/myapp-hpa-v2 created
    

      

    [root@master hpa]# kubectl get hpa
    NAME           REFERENCE          TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
    myapp-hpa-v2   Deployment/myapp   3723264/50Mi, 0%/55%   1         10        1          37s
    

      

        我们看到现在只有一个pod 

    [root@master hpa]# kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP            NODE
    myapp-6985749785-fcvwn   1/1       Running   0          57m       10.244.2.84   node2
    

      

        开始压测: 

    [root@master ~]# ab -c 100 -n 5000000 http://172.16.1.100:31990/index.html
    

      

        看hpa v2的检测情况: 

    [root@master hpa]# kubectl describe hpa
    Metrics:                                               ( current / target )
      resource memory on pods:                             3756032 / 50Mi
      resource cpu on pods  (as a percentage of request):  82% (41m) / 55%
    Min replicas:                                          1
    Max replicas:                                          10
    Deployment pods:                                       1 current / 2 desired
    

      

    [root@master hpa]# kubectl get pods -o wide
    NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
    myapp-6985749785-8frq4   1/1       Running   0          1m        10.244.1.109   node1
    myapp-6985749785-fcvwn   1/1       Running   0          1h        10.244.2.84    node2
    

      

      看到自动扩展出了2个Pod。等压测一停止,pod个数还会收缩为正常个数的。 

        将来我们不光可以用hpa v2,根据cpu和内存使用率进行伸缩Pod个数,还可以根据http并发量等。 

        比如下面的: 

    [root@master hpa]# cat hpa-v2-custom.yaml 
    apiVersion: autoscaling/v2beta1  #从这可以看出是hpa v2版本
    kind: HorizontalPodAutoscaler
    metadata:
      name: myapp-hpa-v2
    spec:
      scaleTargetRef: #根据什么指标来做评估压力
        apiVersion: apps/v1 #对谁来做自动扩展
        kind: Deployment
        name: myapp
      minReplicas: 1 #最少副本数量
      maxReplicas: 10
      metrics: #表示依据哪些指标来进行评估
      - type: Pods #表示基于资源进行评估
        pods: 
          metricName: http_requests#自定义的资源指标
            targetAverageValue: 800m #m表示个数,表示并发数800
    

      

    关于并发数的hpa,具体镜像可以参考https://hub.docker.com/r/ikubernetes/metrics-app/

  • 相关阅读:
    05 . Python入门值循环语句
    04 . kubernetes资源清单YAML入门
    04 . Python入门之条件语句
    03 . Python入门之运算符
    05 . k8s实战之部署PHP/JAVA网站
    02 . Python之数据类型
    01 . Python简介
    04 . Mysql主从复制和Mycat读写分离
    03 . MysSQL权限和备份
    02 . Mysql基础操作及增删改查
  • 原文地址:https://www.cnblogs.com/dribs/p/10332957.html
Copyright © 2011-2022 走看看