zoukankan      html  css  js  c++  java
  • kspan 集群度量方案

    非原创,参考文章如下,相对下列文章信息,操作和说明更加贴近日常工作:

    背景

    作为集群管理员,当我们管理的集群数量众多时,或者pod从创建到启动的过程,需要经理的过程,以及耗时,可以分析出我们的集群慢在哪里。

    在没有可视化工具之前,我们可以通过查看event事件,确定每个步骤的耗时,如下:

    $ kubectl create deploy nginx --image=nginx
    deployment.apps/nginx created
    $ kubectl get event
    LAST SEEN   TYPE     REASON              OBJECT                       MESSAGE
    7s          Normal   Scheduled           pod/nginx-f89759699-whcxz    Successfully assigned default/nginx-f89759699-whcxz to hd-k8s-master003
    7s          Normal   Pulling             pod/nginx-f89759699-whcxz    Pulling image "nginx"
    7s          Normal   SuccessfulCreate    replicaset/nginx-f89759699   Created pod: nginx-f89759699-whcxz
    7s          Normal   ScalingReplicaSet   deployment/nginx             Scaled up replica set nginx-f89759699 to 1
    

    我们可以查看到Pod从调度,pull ,create,start的全部过程,以及大致的时间消耗。

    更优雅的方案

    K8S 中的这些事件,都对应着我们的一个操作,比如上文中是创建了一个 deployment ,它产生了几个 event , 包括 Scheduled , Pulled ,Created 等。我们将其进行抽象,是不是和我们做的链路追踪(tracing)很像呢?

    这里我们会用到一个 CNCF 的毕业项目 Jaeger[1] ,在之前的 K8S生态周报 中我有多次介绍它,Jaeger 是一款开源的,端对端的分布式 tracing 系统。不过本文重点不是介绍它,所以我们查看其文档,快速的部署一个 Jaeger 即可。另一个 CNCF 的 sandbox 级别的项目是 OpenTelemetry[2] 是一个云原生软件的可观测框架,我们可以把它跟 Jaeger 结合起来使用。不过本文的重点不是介绍这俩项目,这里暂且略过。

    接下来介绍我们这篇文章的用到的主要项目,是来自 Weaveworks 开源的一个项目,名叫 kspan ,它的主要做法就是将 K8S 中的 event 作为 trace 系统中的 span 进行组织。

    部署kspan
    创建rbac授权,因为kspan要监听event相关信息

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: kspan
      
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kspan-admin
    rules:
    - apiGroups:
      - ""
      resources:
      - configmaps
      - endpoints
      - persistentvolumeclaims
      - persistentvolumeclaims/status
      - pods
      - replicationcontrollers
      - replicationcontrollers/scale
      - serviceaccounts
      - services
      - services/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - ""
      resources:
      - bindings
      - events
      - limitranges
      - namespaces/status
      - pods/log
      - pods/status
      - replicationcontrollers/status
      - resourcequotas
      - resourcequotas/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - ""
      resources:
      - pods/exec
      verbs:
      - create
    - apiGroups:
      - ""
      resources:
      - namespaces
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - apps
      resources:
      - controllerrevisions
      - daemonsets
      - daemonsets/status
      - deployments
      - deployments/scale
      - deployments/status
      - replicasets
      - replicasets/scale
      - replicasets/status
      - statefulsets
      - statefulsets/scale
      - statefulsets/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - autoscaling
      resources:
      - horizontalpodautoscalers
      - horizontalpodautoscalers/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - batch
      resources:
      - cronjobs
      - cronjobs/status
      - jobs
      - jobs/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - extensions
      resources:
      - daemonsets
      - daemonsets/status
      - deployments
      - deployments/scale
      - deployments/status
      - ingresses
      - ingresses/status
      - networkpolicies
      - replicasets
      - replicasets/scale
      - replicasets/status
      - replicationcontrollers/scale
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - policy
      resources:
      - poddisruptionbudgets
      - poddisruptionbudgets/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - networking.k8s.io
      resources:
      - ingresses
      - ingresses/status
      - networkpolicies
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - metrics.k8s.io
      resources:
      - pods
      - nodes
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - metrics.k8s.io
      resources:
      - pods
      verbs:
      - get
      - list
      - watch
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      creationTimestamp: null
      name: kspan-admin
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kspan-admin
    subjects:
    - kind: ServiceAccount
      name: kspan
      namespace: default
    

    创建pod

    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        run: kspan
      name: kspan
    spec:
      containers:
      - image: docker.io/weaveworks/kspan:v0.0
        name: kspan
        resources: {}
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      serviceAccountName: kspan
    

    部署jagger

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: jaeger
      name: jaeger
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: jaeger
      strategy: {}
      template:
        metadata:
          labels:
            app: jaeger
        spec:
          containers:
          - image: jaegertracing/opentelemetry-all-in-one
            name: opentelemetry-all-in-one
            resources: {}
            ports:
            - containerPort: 16685
            - containerPort: 16686
            - containerPort: 5775
              protocol: UDP
            - containerPort: 6831
              protocol: UDP
            - containerPort: 6832
              protocol: UDP
            - containerPort: 5778
              protocol: TCP
    

    创建jagger svc,它默认会使用 otlp-collector.default:55680 传递 span

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: jaeger
      name: otlp-collector
    spec:
      ports:
      - port: 55680
        protocol: TCP
        targetPort: 55680
      selector:
        app: jaeger
    

    当所有的Pod都启动成功后,我们可以进行访问测试

    效果

    创建ns以及Pod

    $ kubectl create ns moelove
    namespace/moelove created
    $ kubectl -n moelove create deploy nginx --image=nginx
    deployment.apps/nginx created
    

    查看jaeger ui,查看信息
    创建Pod耗时详情

    结论

    目前kspan的开源地址并没有提供定制化部署的方案,或者我没有找到详细的文档,所以不建议将kspan作为kubernetes的常用组件进行部署,当有需求再进行部署,查看任务下发的耗时,找到瓶颈即可。

    如果你是多租户场景,需要针对调度慢等情况做告警,可以研究OpenTelemetry

    每天学习一点点,重在积累!
  • 相关阅读:
    R语言:提取路径中的文件名字符串(basename函数)
    课程一(Neural Networks and Deep Learning),第三周(Shallow neural networks)—— 0、学习目标
    numpy.squeeze()的用法
    课程一(Neural Networks and Deep Learning),第二周(Basics of Neural Network programming)—— 4、Logistic Regression with a Neural Network mindset
    Python numpy 中 keepdims 的含义
    课程一(Neural Networks and Deep Learning),第二周(Basics of Neural Network programming)—— 3、Python Basics with numpy (optional)
    课程一(Neural Networks and Deep Learning),第二周(Basics of Neural Network programming)—— 2、编程作业常见问题与答案(Programming Assignment FAQ)
    课程一(Neural Networks and Deep Learning),第二周(Basics of Neural Network programming)—— 0、学习目标
    课程一(Neural Networks and Deep Learning),第一周(Introduction to Deep Learning)—— 0、学习目标
    windows系统numpy的下载与安装教程
  • 原文地址:https://www.cnblogs.com/GXLo/p/14950806.html
Copyright © 2011-2022 走看看