zoukankan      html  css  js  c++  java
  • Kuberneters-Job(短时任务)的实践

    一、什么是Job?

    Job负责批量处理短暂的一次性任务,完成任务后,容器就会退出,即只执行一次任务,保证批量处理任务中的1个或多个Pod成功结束。

    二、Job的使用场景

    Job适用于执行完一次性工作任务就不再执行,非持续性工作的使用场景,如:AI模型训练、批量计算、数据分析等场景。

    三、Job的实践

    1、非并行的Pod

     此类方式的Job通常kind为Job的容器只创建1个pod,且其启动方式只支持Never和onFailured,不支持Always,若填写的是Always,如下所示,会直接报错

    [root@k8s-master job]# cat job-one.yaml
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-onepod
    spec:
      template:
        metadata:
          name: job-demo
        spec:
          restartPolicy: Always
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
    [root@k8s-master job]# kubectl create -f  job-one.yaml 
    The Job "job-demo-onepod" is invalid: spec.template.spec.restartPolicy: Unsupported value: "Always": supported values: "OnFailure", "Never"   

     删除正在运行中的pod,若Job任务未执行完成,会自动重新启动1个Pod继续执行任务,直到任务执行完成

    [root@k8s-master job]# cat job-one-testrestart.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-onepod-testrestart
    spec:
      template:
        metadata:
          name: job-demo
        spec:
          restartPolicy: Never
          containers:
          - name: counter-restart
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "touch /tmp/healthy;sleep 1000000"                     # 让pod较长时间处于运行状态
    [root@k8s-master job]# kubectl get pod
    NAME                                    READY   STATUS             RESTARTS   AGE
    job-demo-onepod-testrestart-rpvhh       1/1     Running            0          3m23s

    此时删除Pod,能看到会删除任务中旧的Pod,重新起一个新的Pod,如下图所示

    [root@k8s-master job]# kubectl get pod
    NAME                                    READY   STATUS             RESTARTS   AGE
    job-demo-onepod-testrestart-5js9m       1/1     Running            0          6s
    job-demo-onepod-testrestart-rpvhh       1/1     Terminating        0          4m36s

    如果是已经运行完成的Pod,删除之后还会重启吗,再来实验下

    [root@k8s-master job]# kubectl get pod
    NAME                                    READY   STATUS             RESTARTS   AGE
    job-demo-onepod-kmm74                   0/1     Completed          0          22m   
    job-demo-onepod-testrestart-5js9m       1/1     Running            0          107s
    [root@k8s-master job]# kubectl delete pod job-demo-onepod-kmm74
    pod "job-demo-onepod-kmm74" deleted
    [root@k8s-master job]# kubectl get pod
    NAME                                    READY   STATUS             RESTARTS   AGE
    job-demo-onepod-testrestart-5js9m       1/1     Running            0          3m27s

    job-demo-onepod-kmm74这个Pod已经完成了任务,其状态变为:Completed,执行删除操作之后,彻底删除了,不会再重新启动新的Pod,因此,若Job未执行完成,执行删除Pod的操作,会一直起新的Pod继续执行Job直到达到Completed状态

    2、固定结束次数的Job

    设置.spec.completions,但不设置spec.Parallelism,创建多个Pod,直到.spec.completions个Pod成功结束

    [root@k8s-master job]# cat completions-pod-job.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-comppod
    spec:
      completions: 5    # 指定运行的Pod数为5
      template:
        metadata:
          name: job-demoi
        spec:
          restartPolicy: Never
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done
    [root@k8s-master job]# kubectl create -f completions-pod-job.yaml 
    job.batch/job-demo-comppod created
    [root@k8s-master job]# kubectl  get pod | grep job-demo
    job-demo-comppod-b2v2n                  0/1     Completed           0          11s
    job-demo-comppod-ptfqp                  0/1     Completed           0          18s
    job-demo-comppod-xrkm9                  0/1     ContainerCreating   0          3s
    [root@k8s-master job]# kubectl  get pod | grep job-demo   
    job-demo-comppod-b2v2n                  0/1     Completed          0          69s
    job-demo-comppod-jxhqd                  0/1     Completed          0          54s
    job-demo-comppod-p8q7t                  0/1     Completed          0          42s
    job-demo-comppod-ptfqp                  0/1     Completed          0          76s
    job-demo-comppod-xrkm9                  0/1     Completed          0          61s

    可以看到运行了5个Pod,且都运行完成成功退出了,但是在有些场景下,期望能够指定一次运行几个Pod并行处理,这个时候就需要用到固定结束次数的并行Job的方式了。

    如果设置.spec.completions了的情况下,设置spec.Parallelism为0,是否可以正常运行呢,验证下

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-comppod
    spec:
      completions: 1
      parallelism: 0     # 设置为0
      template:
        metadata:
          name: job-demoi
        spec:
          restartPolicy: Never
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
    [root@k8s-master job]# kubectl get job 
    NAME               COMPLETIONS   DURATION   AGE
    job-demo-comppod   0/1                      5m59s
    [root@k8s-master job]# kubectl describe job job-demo-comppod 
    Name:           job-demo-comppod
    Namespace:      default
    Selector:       controller-uid=a0b677be-6ea1-4e09-8b1c-d45048cb9f57
    Labels:         controller-uid=a0b677be-6ea1-4e09-8b1c-d45048cb9f57
                    job-name=job-demo-comppod
    Annotations:    <none>
    Parallelism:    0
    Completions:    1
    Pods Statuses:  0 Running / 0 Succeeded / 0 Failed               
    Pod Template:
      Labels:  controller-uid=a0b677be-6ea1-4e09-8b1c-d45048cb9f57
               job-name=job-demo-comppod
      Containers:
       counter:
        Image:      busybox
        Port:       <none>
        Host Port:  <none>
        Command:
          bin/sh
          -c
          for i in 9 8 7 6 5 4 3 2 1; do echo $i; done
        Environment:  <none>
        Mounts:       <none>
      Volumes:        <none>
    Events:           <none>

    无Pod执行任务,将Parallelism去掉,再验证下

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-comppod
    spec:
      completions: 1
      parallelism: 0
      template:
        metadata:
          name: job-demoi
        spec:
          restartPolicy: Never
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
    [root@k8s-master job]# kubectl create -f completions-pod-job.yaml 
    job.batch/job-demo-comppod created
    [root@k8s-master job]# kubectl get pod |grep demo
    job-demo-comppod-gm9xs                  0/1     Completed          0          39s

    39SPod就执行完了任务,因此Parallelism若设置为0,将会导致任务执行挂起。

    3、带有工作队列的并行Pod

     设置.spec.Parallelism参数但不设置.spec.completions,当所有Pod结束并且至少一个成功时,Job就认为是成功。

    [root@k8s-master job]# cat parallelism_pod.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-parall
    spec:
      parallelism: 3   # 指定并行pod数为3
      template:
        metadata:
          name: job-demo-para
        spec:
          restartPolicy: Never
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
    [root@k8s-master job]# kubectl create -f parallelism_pod.yaml 
    job.batch/job-demo-parall created
    [root@k8s-master job]# kubectl get pod| grep para
    job-demo-parall-76jtn                   0/1     ContainerCreating   0          41s
    job-demo-parall-b7x27                   0/1     Completed           0          41s
    job-demo-parall-rqmtk                   0/1     Completed           0          41s
    [root@k8s-master job]# kubectl get pod| grep para
    job-demo-parall-76jtn                   0/1     ContainerCreating   0          51s
    job-demo-parall-b7x27                   0/1     Completed           0          51s
    job-demo-parall-rqmtk                   0/1     Completed           0          51s
    [root@k8s-master job]# kubectl get pod| grep para
    job-demo-parall-76jtn                   0/1     ContainerCreating   0          54s
    job-demo-parall-b7x27                   0/1     Completed           0          54s
    job-demo-parall-rqmtk                   0/1     Completed           0          54s

     可以看到Pod的创建都是相同的,验证了Pod的并行。

    若指定.spec.Parallelism参数为0,验证下是否能正常执行完任务,先删除job-demo-comppod,然后再创建

    [root@k8s-master job]# cat completions-pod-job.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-comppod
    spec:
      completions: 0
      parallelism: 1
      template:
        metadata:
          name: job-demoi
        spec:
          restartPolicy: Never
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
    [root@k8s-master job]# kubectl create -f completions-pod-job.yaml
    job.batch/job-demo-comppod created
    [root@k8s-master job]# kubectl get job
    NAME               COMPLETIONS   DURATION   AGE
    job-demo-comppod   0/0           0s         119s

    119Spod未执行完任务,去掉.spec.Parallelism再验证下,先删除job-demo-comppod,然后再创建

    [root@k8s-master job]# cat completions-pod-job.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-comppod
    spec:
      parallelism: 1
      template:
        metadata:
          name: job-demoi
        spec:
          restartPolicy: Never
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
    [root@k8s-master job]# kubectl create -f completions-pod-job.yaml
    job.batch/job-demo-comppod created
    [root@k8s-master job]# kubectl get job
    NAME               COMPLETIONS   DURATION   AGE
    job-demo-comppod   1/1           19s        20s

    20SPod执行完了任务,因此可以判断若指定spec.Parallelism=0则任务也会被挂起

    4、固定结束次数的并行Job

    同时设置.spec.completions.spec.Parallelism参数,多个Pod同时处理工作队列,先删除这个job,然后重新创建1个新的job,并且指定每次并行运行的Pod数量

    [root@k8s-master job]# kubectl delete job job-demo-comppod 
    job.batch "job-demo-comppod" deleted
    [root@k8s-master job]# cat completions-pod-job.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-comppod
    spec:
      completions: 5     # 指定本次job任务需要运行5个Pod
      parallelism: 2     # 指定每次并行2个Pod运行
      template:
        metadata:
          name: job-demoi
        spec:
          restartPolicy: Never
          containers:
          - name: counter
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done"
    job.batch/job-demo-comppod created
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     ContainerCreating   0          2s
    job-demo-comppod-8tfch                  0/1     ContainerCreating   0          2s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     ContainerCreating   0          5s
    job-demo-comppod-8tfch                  0/1     ContainerCreating   0          5s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     ContainerCreating   0          7s
    job-demo-comppod-8tfch                  0/1     ContainerCreating   0          7s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     ContainerCreating   0          10s
    job-demo-comppod-8tfch                  0/1     Completed           0          10s
    job-demo-comppod-9dwwh                  0/1     ContainerCreating   0          2s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     Completed           0          14s
    job-demo-comppod-8tfch                  0/1     Completed           0          14s
    job-demo-comppod-9dwwh                  0/1     ContainerCreating   0          6s
    job-demo-comppod-w67g7                  0/1     ContainerCreating   0          0s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     Completed           0          17s
    job-demo-comppod-8tfch                  0/1     Completed           0          17s
    job-demo-comppod-9dwwh                  0/1     ContainerCreating   0          9s
    job-demo-comppod-w67g7                  0/1     ContainerCreating   0          3s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     Completed           0          20s
    job-demo-comppod-6xzj4                  0/1     ContainerCreating   0          2s
    job-demo-comppod-8tfch                  0/1     Completed           0          20s
    job-demo-comppod-9dwwh                  0/1     Completed           0          12s
    job-demo-comppod-w67g7                  0/1     ContainerCreating   0          6s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     Completed           0          25s
    job-demo-comppod-6xzj4                  0/1     ContainerCreating   0          7s
    job-demo-comppod-8tfch                  0/1     Completed           0          25s
    job-demo-comppod-9dwwh                  0/1     Completed           0          17s
    job-demo-comppod-w67g7                  0/1     Completed           0          11s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     Completed           0          27s
    job-demo-comppod-6xzj4                  0/1     ContainerCreating   0          9s
    job-demo-comppod-8tfch                  0/1     Completed           0          27s
    job-demo-comppod-9dwwh                  0/1     Completed           0          19s
    job-demo-comppod-w67g7                  0/1     Completed           0          13s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     Completed           0          29s
    job-demo-comppod-6xzj4                  0/1     ContainerCreating   0          11s
    job-demo-comppod-8tfch                  0/1     Completed           0          29s
    job-demo-comppod-9dwwh                  0/1     Completed           0          21s
    job-demo-comppod-w67g7                  0/1     Completed           0          15s
    [root@k8s-master job]# kubectl get pod |grep job-demo
    job-demo-comppod-5svj4                  0/1     Completed          0          30s
    job-demo-comppod-6xzj4                  0/1     Completed          0          12s
    job-demo-comppod-8tfch                  0/1     Completed          0          30s
    job-demo-comppod-9dwwh                  0/1     Completed          0          22s
    job-demo-comppod-w67g7                  0/1     Completed          0          16s

    从面的验证过程发现,Job在执行的任意时刻都会保持最多2个Pod的执行,但是这里不包括Completed的状态Pod

    若parallelism的值大于completions任务是否可以正常执行,验证如下,先删除job-demo-comppod,然后再创建

    [root@k8s-master job]# kubectl delete job job-demo-comppod
    job.batch "job-demo-comppod" deleted
    [root@k8s-master job]# kubectl get job
    NAME               COMPLETIONS   DURATION   AGE
    job-demo-comppod   0/5           8h         8h
    [root@k8s-master job]# kubectl describe job job-demo-comppod 
    Name:           job-demo-comppod
    Namespace:      default
    Selector:       controller-uid=dfea8576-9f3e-4b00-bd72-1c884f9e420c
    Labels:         controller-uid=dfea8576-9f3e-4b00-bd72-1c884f9e420c
                    job-name=job-demo-comppod
    Annotations:    <none>
    Parallelism:    6
    Completions:    5
    Start Time:     Mon, 06 Jul 2020 20:29:07 +0800
    Pods Statuses:  5 Running / 0 Succeeded / 0 Failed
    Pod Template:
      Labels:  controller-uid=dfea8576-9f3e-4b00-bd72-1c884f9e420c
               job-name=job-demo-comppod
      Containers:
       counter:
        Image:      busybox
        Port:       <none>
        Host Port:  <none>
        Command:
          bin/sh
          -c
          for i in 9 8 7 6 5 4 3 2 1; do echo $i; done
        Environment:  <none>
        Mounts:       <none>
      Volumes:        <none>
    Events:
      Type    Reason            Age   From            Message
      ----    ------            ----  ----            -------
      Normal  SuccessfulCreate  8h    job-controller  Created pod: job-demo-comppod-lqgdw
      Normal  SuccessfulCreate  8h    job-controller  Created pod: job-demo-comppod-hw5ql
      Normal  SuccessfulCreate  8h    job-controller  Created pod: job-demo-comppod-tn5nz
      Normal  SuccessfulCreate  8h    job-controller  Created pod: job-demo-comppod-8qfv8
      Normal  SuccessfulCreate  8h    job-controller  Created pod: job-demo-comppod-pprd7
    [root@k8s-master job]# kubectl get pod |grep demo
    job-demo-comppod-8qfv8                  0/1     ContainerCreating   0          8h
    job-demo-comppod-hw5ql                  0/1     ContainerCreating   0          8h
    job-demo-comppod-lqgdw                  0/1     ContainerCreating   0          8h
    job-demo-comppod-pprd7                  0/1     ContainerCreating   0          8h
    job-demo-comppod-tn5nz                  0/1     ContainerCreating   0 

    验证后发现,Pod会创建成功为running状态,但是任务不能执行完成。

     5、设置Job的超时时间

    如果想要在执行job的过程中,设置一个超时时间,如果超过了此超时时间,Pod即便未运行完成也希望它被终止掉,可以指定activeDeadlineSeconds

    [root@k8s-master job]# cat job-one-testrestart.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-onepod-timesecondtest
    spec:
      activeDeadlineSeconds : 10   # 设置超时时间为10S
      template:
        metadata:
          name: job-demo
        spec:
          restartPolicy: Never
          containers:
          - name: counter-time
            image: busybox
            command:
            - "bin/sh"
            - "-c"
            - "touch /tmp/healthy;sleep 1000000"
    [root@k8s-master job]# kubectl get pod |grep one 
    job-demo-onepod-timesecondtest-pwjrg    1/1     Terminating        0          26s

    [root@k8s-master job]# kubectl get pod |grep one

    创建完成之后,因为设置的容器sleep 超过了10秒,10秒之后容器进入到Terminating状态,随后此容器会被删掉,且不因为重启策略设置的是Never,容器不会被重启,activeDeadlineSeconds适用于需要指定超时时间的场景。

    6、指定Job失败后,容器重试的次数

    [root@k8s-master job]# cat job-one-testrestart.yaml 
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo-onepod-timesecondtest
    spec:
      backoffLimit : 3      # 指定job失败后进行重试的次数
      template:
        metadata:
          name: job-demo
        spec:
          restartPolicy: Never
          containers:
          - name: counter-time
            image: busybox
            command:
            - "bin/sh"
            - "*********"               # 将启动命名设置为异常
            - "touch /tmp/healthy;sleep 1000000"
    [root@k8s-master job]# kubectl get pod |grep one
    job-demo-onepod-timesecondtest-6sttm    0/1     Error              0          4m1s
    job-demo-onepod-timesecondtest-hpnz6    0/1     Error              0          4m22s
    job-demo-onepod-timesecondtest-jcfgn    0/1     Error              0          3m17s

    这里需要注意的是,每次重试都会生成一个新的Pod,而不是针对一个Pod进行重启,所以上面显示的是3个异常的Pod。

    四、总结

    本文介绍了K8S中的Job用法,在实际的生产环境中若是需要管理持续性(非一次性)的工作负载需要采用无状态负载(Deployment)或有状态负载(Statufulset),若是只需要执行一次性工作任务的场景,可以采用Job类的工作负载。

  • 相关阅读:
    Luogu-P1404 平均数
    树的直径与重心
    卡常技巧
    背包问题总结
    Codevs-1521 华丽的吊灯
    区间dp与环形dp
    Luogu-P1308 神经网络
    拓扑排序
    01分数规划
    Python学习 4day__基础知识
  • 原文地址:https://www.cnblogs.com/gdut1425/p/13195654.html
Copyright © 2011-2022 走看看