一、k8s调度流程
1、(预选)先排除完全不符合pod运行要求的节点
2、(优先)根据一系列算法,算出node的得分,最高没有相同的,就直接选择
3、上一步有相同的话,就随机选一个
二、调度方式
1、node(运行在那些node上)
2、pod选择(当需要运行在某个pod在一个节点上(pod亲和性),或不要pod和某个pod运行在一起(pod反亲和性))
3、污点 (pod是否能容忍污点,能则能调度到该节点,不能容忍则无法调度到该节点,如果存在则驱离pod),可以定义容忍时间
三、常用的预选机制
调度器: 预选策略:(一部分) CheckNodeCondition:#检查节点是否正常(如ip,磁盘等) GeneralPredicates HostName:#检查Pod对象是否定义了pod.spec.hostname PodFitsHostPorts:#pod要能适配node的端口 pods.spec.containers.ports.hostPort(指定绑定在节点的端口上) MatchNodeSelector:#检查节点的NodeSelector的标签 pods.spec.nodeSelector PodFitsResources:#检查Pod的资源需求是否能被节点所满足 NoDiskConflict: #检查Pod依赖的存储卷是否能满足需求(默认未使用) PodToleratesNodeTaints:#检查Pod上的spec.tolerations可容忍的污点是否完全包含节点上的污点; PodToleratesNodeNoExecuteTaints:#不能执行(NoExecute)的污点(默认未使用) CheckNodeLabelPresence:#检查指定的标签再上节点是否存在 CheckServiceAffinity:#将相同services相同的pod尽量放在一起(默认未使用) MaxEBSVolumeCount: #检查EBS(AWS存储)存储卷的最大数量 MaxGCEPDVolumeCount #GCE存储最大数 MaxAzureDiskVolumeCount: #AzureDisk 存储最大数 CheckVolumeBinding: #检查节点上已绑定或未绑定的pvc NoVolumeZoneConflict: #检查存储卷对象与pod是否存在冲突 CheckNodeMemoryPressure:#检查节点内存是否存在压力过大 CheckNodePIDPressure: #检查节点上的PID数量是否过大 CheckNodeDiskPressure: #检查内存、磁盘IO是否过大 MatchInterPodAffinity: #检查节点是否能满足pod的亲和性或反亲和性
四、常用的优选函数
LeastRequested:#空闲量越高得分越高 (cpu((capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2 BalancedResourceAllocation:#CPU和内存资源被占用率相近的胜出; NodePreferAvoidPods: #节点注解信息“scheduler.alpha.kubernetes.io/preferAvoidPods” TaintToleration:#将Pod对象的spec.tolerations列表项与节点的taints列表项进行匹配度检查,匹配条目越,得分越低; SeletorSpreading:#标签选择器分散度,(与当前pod对象通选的标签,所选其它pod越多的得分越低) InterPodAffinity:#遍历pod对象的亲和性匹配项目,项目越多得分越高 NodeAffinity: #节点亲和性 、 MostRequested: #空闲量越小得分越高,和LeastRequested相反 (默认未启用) NodeLabel: #节点是否存在对应的标签 (默认未启用) ImageLocality:#根据满足当前Pod对象需求的已有镜像的体积大小之和(默认未启用)
五、高级调度设置方式
1、nodeSelector选择器
#查看标签
[root@k8s-m ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-m Ready master 120d v1.11.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=k8s-m,node-role.kubernetes.io/master=
node1 Ready <none> 120d v1.11.2 app=myapp,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,disktype=ssd,kubernetes.io/hostname=node1,test_node=k8s-node1
#使用nodeSelector选择器,选择disk=ssd的node
#查看
[root@k8s-m schedule]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
nginx-pod 1/1 Running 0 49s 10.244.1.92 node1 <none>
[root@k8s-m schedule]# cat my-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: my-pod
spec:
containers:
- name: my-pod
image: nginx
ports:
- name: http
containerPort: 80
nodeSelector:
disk: ssd
#如果nodeSelector中指定的标签节点都没有,该pod就会处于Pending状态(预选失败)
2、affinity
2.1、nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (软亲和,选择条件匹配多的,就算都不满足条件,还是会生成pod)
#使用
[root@k8s-m schedule]# cat my-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: test_node1 #标签键名
operator: In #In表示在
values:
- k8s-node1 #test_node1标签的值
- test1 #test_node1标签的值
weight: 60 #匹配相应nodeSelectorTerm相关联的权重,1-100
##查看(不存在这个标签,但是还是创建bin运行了)
[root@k8s-m schedule]# kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity-pod 1/1 Running 0 16s
2.2、requiredDuringSchedulingIgnoredDuringExecution (硬亲和,类似nodeSelector,硬性需求,如果不满足条件不会调度pod,都不满足则Pending)
[root@k8s-m schedule]# cat my-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test_node1 #标签键名
operator: In #In表示在
values:
- k8s-node1 #test_node1标签的值
- test1 #test_node1标签的值
#查看(没有test_node1这个标签,所以会Pending)
[root@k8s-m schedule]# kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity-pod 0/1 Pending 0 4s
六、pod的亲和与反亲和性
1、podAffinity:(让pod和某个pod处于同一地方(同一地方不一定指同一node节点,根据个人使用的标签定义))
#使用(让affinity-pod和my-pod1处于同一处)
[root@k8s-m schedule]# cat my-affinity-pod2.yaml
apiVersion: v1
kind: Pod
metadata:
name: my-pod1
labels:
app1: my-pod1
spec:
containers:
- name: my-pod1
image: nginx
ports:
- name: http
containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app1 #标签键名,上面pod定义
operator: In #In表示在
values:
- my-pod1 #app1标签的值
topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置 #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑
#查看
[root@k8s-m schedule]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
affinity-pod 1/1 Running 0 54s 10.244.1.98 node1 <none>
my-pod1 1/1 Running 0 54s 10.244.1.97 node1 <none>
2、podAntiAffinity(让pod和某个pod不处于同一node,和上面相反)
[root@k8s-m schedule]# cat my-affinity-pod3.yaml
apiVersion: v1
kind: Pod
metadata:
name: my-pod1
labels:
app1: my-pod1
spec:
containers:
- name: my-pod1
image: nginx
ports:
- name: http
containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
podAntiAffinity: #就改了这里
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app1 #标签键名,上面pod定义
operator: In #In表示在
values:
- my-pod1 #app1标签的值
topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod不处于同一位置
#查看(我自有一台node,所有是Pending状态)
[root@k8s-m schedule]# kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity-pod 0/1 Pending 0 1m
my-pod1 1/1 Running 0 1m
七、污点调度
taint的effect定义对Pod排斥效果:
NoSchedule:#仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute:#既影响调度过程,也影响现在的Pod对象;不容忍的Pod对象将被驱逐;
PreferNoSchedule: #当没合适地方运行pod了,也会找地方运行pod
1、查看并管理污点
#查看node污点(Taints) [root@k8s-m schedule]# kubectl describe node k8s-m |grep Taints Taints: node-role.kubernetes.io/master:NoSchedule [root@k8s-m schedule]# kubectl describe node node1 |grep Taints Taints: <none> #管理污点taint kubectl taint node -h #打污点(给node打标签) kubectl taint node node1 node-type=PreferNoSchedule:NoSchedule #查看 [root@k8s-m schedule]# kubectl describe node node1 |grep Taints Taints: node-type=PreferNoSchedule:NoSchedule #删除污点 [root@k8s-m ~]# kubectl taint node node1 node-type- node/node1 untainted #查看 [root@k8s-m ~]# kubectl describe node node1 |grep Taints aints: <none>
2、使用污点
#创建pod
[root@k8s-m ~]# cat mypod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: my-pod
spec:
containers:
- name: my-pod
image: nginx
ports:
- name: http
containerPort: 80
#查看pod(Pinding了)
[root@k8s-m ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-pod 0/1 Pending 0 32s
#不能容忍污点
[root@k8s-m ~]# kubectl describe pod nginx-pod|tail -1
Warning FailedScheduling 3s (x22 over 1m) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
###使用
[root@k8s-m ~]# cat mypod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: my-pod
spec:
containers:
- name: my-pod
image: nginx
ports:
- name: http
containerPort: 80
tolerations: #容忍的污点
- key: "node-type" #之前定义的污点名
operator: "Equal" #Exists,如果node-type污点在,就能容忍,Equal精确
value: "PreferNoSchedule" #污点值
effect: "NoSchedule" #效果
#tolerationSeconds: 3600 #如果被驱逐的话,容忍时间,只能是effect为tolerationSeconds或NoExecute定义
#查看(已经调度了)
[root@k8s-m ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
nginx-pod 1/1 Running 0 3m 10.244.1.100 node1 <none>