一、k8s调度流程
1、(预选)先排除完全不符合pod运行要求的节点
2、(优先)根据一系列算法,算出node的得分,最高没有相同的,就直接选择
3、上一步有相同的话,就随机选一个
二、调度方式
1、node(运行在那些node上)
2、pod选择(当需要运行在某个pod在一个节点上(pod亲和性),或不要pod和某个pod运行在一起(pod反亲和性))
3、污点 (pod是否能容忍污点,能则能调度到该节点,不能容忍则无法调度到该节点,如果存在则驱离pod),可以定义容忍时间
三、常用的预选机制
调度器: 预选策略:(一部分) CheckNodeCondition:#检查节点是否正常(如ip,磁盘等) GeneralPredicates HostName:#检查Pod对象是否定义了pod.spec.hostname PodFitsHostPorts:#pod要能适配node的端口 pods.spec.containers.ports.hostPort(指定绑定在节点的端口上) MatchNodeSelector:#检查节点的NodeSelector的标签 pods.spec.nodeSelector PodFitsResources:#检查Pod的资源需求是否能被节点所满足 NoDiskConflict: #检查Pod依赖的存储卷是否能满足需求(默认未使用) PodToleratesNodeTaints:#检查Pod上的spec.tolerations可容忍的污点是否完全包含节点上的污点; PodToleratesNodeNoExecuteTaints:#不能执行(NoExecute)的污点(默认未使用) CheckNodeLabelPresence:#检查指定的标签再上节点是否存在 CheckServiceAffinity:#将相同services相同的pod尽量放在一起(默认未使用) MaxEBSVolumeCount: #检查EBS(AWS存储)存储卷的最大数量 MaxGCEPDVolumeCount #GCE存储最大数 MaxAzureDiskVolumeCount: #AzureDisk 存储最大数 CheckVolumeBinding: #检查节点上已绑定或未绑定的pvc NoVolumeZoneConflict: #检查存储卷对象与pod是否存在冲突 CheckNodeMemoryPressure:#检查节点内存是否存在压力过大 CheckNodePIDPressure: #检查节点上的PID数量是否过大 CheckNodeDiskPressure: #检查内存、磁盘IO是否过大 MatchInterPodAffinity: #检查节点是否能满足pod的亲和性或反亲和性
四、常用的优选函数
LeastRequested:#空闲量越高得分越高 (cpu((capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2 BalancedResourceAllocation:#CPU和内存资源被占用率相近的胜出; NodePreferAvoidPods: #节点注解信息“scheduler.alpha.kubernetes.io/preferAvoidPods” TaintToleration:#将Pod对象的spec.tolerations列表项与节点的taints列表项进行匹配度检查,匹配条目越,得分越低; SeletorSpreading:#标签选择器分散度,(与当前pod对象通选的标签,所选其它pod越多的得分越低) InterPodAffinity:#遍历pod对象的亲和性匹配项目,项目越多得分越高 NodeAffinity: #节点亲和性 、 MostRequested: #空闲量越小得分越高,和LeastRequested相反 (默认未启用) NodeLabel: #节点是否存在对应的标签 (默认未启用) ImageLocality:#根据满足当前Pod对象需求的已有镜像的体积大小之和(默认未启用)
五、高级调度设置方式
1、nodeSelector选择器
#查看标签 [root@k8s-m ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8s-m Ready master 120d v1.11.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=k8s-m,node-role.kubernetes.io/master= node1 Ready <none> 120d v1.11.2 app=myapp,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,disktype=ssd,kubernetes.io/hostname=node1,test_node=k8s-node1 #使用nodeSelector选择器,选择disk=ssd的node #查看 [root@k8s-m schedule]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE nginx-pod 1/1 Running 0 49s 10.244.1.92 node1 <none> [root@k8s-m schedule]# cat my-pod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 nodeSelector: disk: ssd #如果nodeSelector中指定的标签节点都没有,该pod就会处于Pending状态(预选失败)
2、affinity
2.1、nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (软亲和,选择条件匹配多的,就算都不满足条件,还是会生成pod)
#使用 [root@k8s-m schedule]# cat my-affinity-pod.yaml apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: test_node1 #标签键名 operator: In #In表示在 values: - k8s-node1 #test_node1标签的值 - test1 #test_node1标签的值 weight: 60 #匹配相应nodeSelectorTerm相关联的权重,1-100 ##查看(不存在这个标签,但是还是创建bin运行了) [root@k8s-m schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-pod 1/1 Running 0 16s
2.2、requiredDuringSchedulingIgnoredDuringExecution (硬亲和,类似nodeSelector,硬性需求,如果不满足条件不会调度pod,都不满足则Pending)
[root@k8s-m schedule]# cat my-affinity-pod.yaml apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: test_node1 #标签键名 operator: In #In表示在 values: - k8s-node1 #test_node1标签的值 - test1 #test_node1标签的值 #查看(没有test_node1这个标签,所以会Pending) [root@k8s-m schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-pod 0/1 Pending 0 4s
六、pod的亲和与反亲和性
1、podAffinity:(让pod和某个pod处于同一地方(同一地方不一定指同一node节点,根据个人使用的标签定义))
#使用(让affinity-pod和my-pod1处于同一处) [root@k8s-m schedule]# cat my-affinity-pod2.yaml apiVersion: v1 kind: Pod metadata: name: my-pod1 labels: app1: my-pod1 spec: containers: - name: my-pod1 image: nginx ports: - name: http containerPort: 80 --- apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app1 #标签键名,上面pod定义 operator: In #In表示在 values: - my-pod1 #app1标签的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置 #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑 #查看 [root@k8s-m schedule]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE affinity-pod 1/1 Running 0 54s 10.244.1.98 node1 <none> my-pod1 1/1 Running 0 54s 10.244.1.97 node1 <none>
2、podAntiAffinity(让pod和某个pod不处于同一node,和上面相反)
[root@k8s-m schedule]# cat my-affinity-pod3.yaml apiVersion: v1 kind: Pod metadata: name: my-pod1 labels: app1: my-pod1 spec: containers: - name: my-pod1 image: nginx ports: - name: http containerPort: 80 --- apiVersion: v1 kind: Pod metadata: name: affinity-pod labels: app: my-pod spec: containers: - name: affinity-pod image: nginx ports: - name: http containerPort: 80 affinity: podAntiAffinity: #就改了这里 requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app1 #标签键名,上面pod定义 operator: In #In表示在 values: - my-pod1 #app1标签的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod不处于同一位置 #查看(我自有一台node,所有是Pending状态) [root@k8s-m schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-pod 0/1 Pending 0 1m my-pod1 1/1 Running 0 1m
七、污点调度
taint的effect定义对Pod排斥效果:
NoSchedule:#仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute:#既影响调度过程,也影响现在的Pod对象;不容忍的Pod对象将被驱逐;
PreferNoSchedule: #当没合适地方运行pod了,也会找地方运行pod
1、查看并管理污点
#查看node污点(Taints) [root@k8s-m schedule]# kubectl describe node k8s-m |grep Taints Taints: node-role.kubernetes.io/master:NoSchedule [root@k8s-m schedule]# kubectl describe node node1 |grep Taints Taints: <none> #管理污点taint kubectl taint node -h #打污点(给node打标签) kubectl taint node node1 node-type=PreferNoSchedule:NoSchedule #查看 [root@k8s-m schedule]# kubectl describe node node1 |grep Taints Taints: node-type=PreferNoSchedule:NoSchedule #删除污点 [root@k8s-m ~]# kubectl taint node node1 node-type- node/node1 untainted #查看 [root@k8s-m ~]# kubectl describe node node1 |grep Taints aints: <none>
2、使用污点
#创建pod [root@k8s-m ~]# cat mypod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 #查看pod(Pinding了) [root@k8s-m ~]# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-pod 0/1 Pending 0 32s #不能容忍污点 [root@k8s-m ~]# kubectl describe pod nginx-pod|tail -1 Warning FailedScheduling 3s (x22 over 1m) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate. ###使用 [root@k8s-m ~]# cat mypod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 tolerations: #容忍的污点 - key: "node-type" #之前定义的污点名 operator: "Equal" #Exists,如果node-type污点在,就能容忍,Equal精确 value: "PreferNoSchedule" #污点值 effect: "NoSchedule" #效果 #tolerationSeconds: 3600 #如果被驱逐的话,容忍时间,只能是effect为tolerationSeconds或NoExecute定义 #查看(已经调度了) [root@k8s-m ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE nginx-pod 1/1 Running 0 3m 10.244.1.100 node1 <none>