scheduler调度过程:
predicate --> priority --> select
预选 优选 选定
调度方式:
1,节点倾向性调度 node affinity
2,pod affinity pod亲和性调度 pod反亲和性调度
3,污点和污点容忍调度 taints(污点),tolerations(容忍性)
源码:https://github.com/kubernetes/kubernetes/tree/master/pkg/scheduler/algorithm
默认的常用预选机制:
一票否决的方式
CheckNodeCondition
GeneralPredicates:
HostName pod.spec.hostname 检查对象是否定义了 是pod的hostname
PodFitsHostPorts: pods.spec.containers.ports.hostPort
MatchNodeSelector: pods.spec.nodeSelector
podFitsRescources: 检查pod的资源需求是否被节点满足 kubectl describe nodes node1
NoDiskConflict 检查pod依赖的存储卷是否能满足需求 默认不启用
PodToleratesNodeTaints: 检查pod上spec.tolerations可容忍的污点是否完全包含节点上的污点,
PodToleratesNodeNoExecuteTaints:检查pod容忍不能执行的污点 默认不启用
CheckNodeLabelPresence:检查节点上指定标签的存在性 默认不启用 即通过节点标签调度
CheckServiceAffinity:将相同service的pod尽量放置在同一个node上 默认不启用
MaxEBSVolumeCount:亚马逊弹性存储 即节点上对应挂载的云存储数量
MaxGCEPDVolumeCount:谷歌云存储
MaxAzureDiskVolumeCount:微软云存储
CheckVolumeBinding:检查节点上绑定的pvc数量
NoVolumeZoneConflict: 检查区域的存储卷剩余份额
CheckNodeMemoryPressure:检查节点内存资源是否存在压力
CheckNodePIDPressure: 检查进程数量是否过大
CheckNodeDiskPressure: 检查硬盘存储压力是否过大
MatchInterPodAffinity: 检查节点亲和性
默认的优选函数(针对调度node)
启用所有优先函数,根据每个优选函数评分相加,总得分最高即最佳
LeastRequested: 资源剩余量比例越高得分越高 priority= (cpu(capacity-sum(requested))*10/capacity + memory(capacity-sum(requested))*10/capacity)/2
balanced_resource_allocation: cpu和内存占用率相近得分越高
node_prefer_avoid_pods:节点的注解信息”scheduler.alpha.kubernetes.io/preferAvoidPods”匹配越多的得分越高
taint_toleration: 将pod对象的spec.tolerations列表项与节点taints列表项进行匹配度检查,匹配的条目越多,优先级越低
selector_spreading:与当前pod同属标签的pod所在的节点越多的越低
interpod_affinity:遍历pod的亲和性,满足亲和性条目越多的,得分越高
node_affinity: node亲和性 根据pod上的nodeselector匹配节点检查,匹配的数量越多,得分越高
most_requested 空闲量越小的得分越高 默认不启用
node_label:根据节点标签来评判得分 默认不启用
image_locality: node上满足pod已有镜像总体积越高得分高 默认不启用
高级调度预设机制
节点选择器:nodeSelector nodeName
节点亲和调度: nodeAffinity
nodeSelector 强约束
kubectl explain pods.spec.nodeSelector
mkdir schedule
cp ../pod-sa-demo.yaml ./
vim pod-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
namespace: default
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: myapp
containerPort: 80
nodeSelector:
disktype: ssd
kubectl apply -f pod-demo.yaml
kubectl label nodes node01 disktype=ssd
kubectl get nodes --show-label
kubectl get pods 新创建的pod将会在打了标签的node上运行
affinity
kubectl explain pods.spec.affinity
kubectl explain pods.spec.affinity.nodeAffinity
preferredDuringSchedulingIgnoredDuringExecution <[]Object> 软亲和性 尽量满足 满足不了也没关系
requiredDuringSchedulingIgnoredDuringExecution <Object> 硬亲和性 一定要满足才会在那个节点运行
实例
cp pod-demo.yaml pod-nodeaffinity-demo.yaml
vim pod-nodeaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-node-affinity-demo
namespace: default
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: myapp
containerPort: 80
affinity: 亲和性优选
nodeAffinity: node亲和性
requiredDuringSchedulingIgnoredDuringExecution: 硬亲和性,一定要满足亲和性
nodeSelectorTerms: 节点标签组
- matchExpressions: 匹配表达式
- key: zone --.key
operator: In àoperator
values: 值
- foo 值1
- bar 值2
kubectl apply -f pod-nodeaffinity-demo.yaml
vim pod-nodeaffinity-demo-2.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-node-affinity-demo-2
namespace: default
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: myapp
containerPort: 80
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution: 有下面的亲和性优先,如果都没有,也能运行
- preference:
matchExpressions:
- key: zone
operator: In
values:
- foo
- bar
weight: 60
kubectl apply -f pod-nodeaffinity-demo-2.yaml
podAffinity调度 pod亲和性
podAntiAffinity pod反亲和性 以第一个pod所在的节点作为评判后续pod到所在节点的方式,需要判定哪些pod在相同节点,哪些不在同一节点
kubectl explain pods.spec.affinity.podAffinity pod亲和性既有硬亲和也有软亲和性
preferredDuringSchedulingIgnoredDuringExecution <[]Object> 软亲和性
requiredDuringSchedulingIgnoredDuringExecution <[]Object> 硬亲和性
topologyKey <string> 判断位置
labelSelector <Object> 判定跟哪个或哪些pod亲和
namespaces <[]string> 指名亲和pod的名称空间,如果没指明,就是当前要创建的pod的名称空间,一般不会跨名称空间引用其他pod
kubectl explain pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution.labelSelector
matchExpressions <[]Object> 集合选择器
matchLabels <map[string]string> 等值选择器
实例
cp pod-demo.yaml pod-required-affinity-demo.yaml
vim pod-required-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-first
namespace: default
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
name: pod-second
namespace: default
labels:
app: backend
tier: db
spec:
containers:
- name: busybox
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh","-c","sleep 3600"]
affinity:
podAffinity: pod亲和优选
requiredDuringSchedulingIgnoredDuringExecution: 指定硬亲和
- labelSelector: pod标签选择器,选择这个标签的pod做为亲和对象
matchExpressions: 匹配pod标签表达式
- {key: app, operator: In, values: ["myapp"]} 标签 app=myapp
topologyKey: Kubernetes.io/hostname 与亲和pod一起放在哪个node上运行 指定node唯一的标签的key做标识符,然后values与亲和的 pod的values一致,从而指定和亲和的pod在同一个node或同一类node上运行
apiVersion: v1
kind: Pod
metadata:
name: pod-first
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
name: pod-second
labels:
app: backend
tier: db
spec:
containers:
- name: busybox
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh","-c","sleep 3600"]
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- {key: app, operator: In, values: ["myapp"]}
topologyKey: kubernetes.io/hostname
kubectl delete -f pod-required-affinity-demo.yaml
kubectl apply -f pod-required-affinity-demo.yaml
podAntiAffinity pod反亲和
实例
cp pod-required-affinity-demo.yaml pod-required-Antiaffinity-demo.yaml
vim pod-required-Antiaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-three
labels:
app: myapp
tier: frontend
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
name: pod-four
labels:
app: backend
tier: db
spec:
containers:
- name: busybox
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh","-c","sleep 3600"]
affinity:
podAntiAffinity: pod反亲和优选策略 下面其他参数名与pod亲和一样
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- {key: app, operator: In, values: ["myapp"]}
topologyKey: kubernetes.io/hostname
kubectl delete -f pod-required-affinity-demo.yaml
kubectl apply -f pod-required-Antiaffinity-demo.yaml
因为只有一个node,而又是反亲和,所有只能pengding状态
污点调度 给了节点主动选择权 节点属性
kubectl get nodes node01 -o yaml
kubectl explain nodes.spec
taints
kubectl explain nodes.spec.taints 污点
kubectl explain nodes.spec.taints.effect
effect <string> -required- 当pod不能容忍时采取的行为 定义对pod的排斥效果
NoExecute: 不仅影响调度,还影响现存的pod对象;不容忍的pod对象将被node驱逐
NoSchedule: 仅影响调度过程,对现存的pod对象不产生影响;不能容忍不要调度过来
PreferNoSchedule:仅影响调度过程,对现存的pod对象不产生影响;不能容忍不会调度过来,
但非要调度过来也行
Master污点
kubectl describe nodes master
Taints: node-role.kubernetes.io/master:NoSchedule
污点 effect
Pod不能容忍这个污点,就不调度过来
Pod容忍度
Kubectl get pods -n kube-system
kubectl describe pods -n kube-system kube-apiserver-master
Tolerations: :NoExecute、
kubectl describe pods -n kube-system kube-flannel-ds-amd64-99ccn
Tolerations: :NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
管理节点的污点
kubectl taint –help 打上污点
kubectl taint node node01 node-type=production:NoSchedule
kubectl taint node node01 node-type- 删除污点
指定node01 污点key=污点value: 不能容忍的就不要调度 effect
Taints: node-type=production:NoSchedule
vim deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: myapp
release: canary
template:
metadata:
labels:
app: myapp
release: canary
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: http
containerPort: 80
kubectl apply -f deploy-demo.yaml 因为pod 上没加污点容忍度,所以不会运行,状态为pending
kubectl taint node node02 node-type=qa:NoExecute
污点key 污点value pod不能容忍就驱逐 effect
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: http
containerPort: 80
tolerations: pod容忍以下污点 既可以在这个污点里运行
- key: "node-type" node的污点key
operator: "Equal" 等值比较 equal和node的污点完全一致 exist存在在node污点里
value: "production" node污点value
effect: "NoExecute" pod能够容忍的程度
tolerationSeconds: 60 驱逐时间
kubectl apply -f deploy-demo.yaml
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: http
containerPort: 80
tolerations: 容忍以下污点名 既可以在这个污点上运行
- key: "node-type"
operator: "Exists" 等值比较 存在在污点里
value: ""
effect: "NoSchedule" 能够容忍的程度
kubectl apply -f deploy-demo.yaml
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v1
ports:
- name: http
containerPort: 80
tolerations: 容忍以下污点名 既可以在这个污点上运行
- key: "node-type"
operator: "Exists" 等值比较 存在在污点里
value: ""
effect: "" 表示能够容忍所有的程度
kubectl apply -f deploy-demo.yaml
容忍程度: NoExecute > NoSchedule > PreferNoSchedule
最大容忍度:NoExcute