一、基本概念
1、调度器选择节点的三步
- 预选:先排除所以完全不符合的节点,包括资源需求、端口冲突、污点排除等
- 优选:计算每个节点的优先级并排序,找出最佳匹配(得分最高)的节点
- 选定:如果得分最高的节点不是一个,则随机选择一个,调度完成
2、常用的调度预选策略
- CheckNodeCondition:检查节点状态是否正常
- GeneralPredicates:通用预选策略
- HostName:检查pod.spec.hostname是否定义,并且检查节点中是否有同名的pod而冲突
- PodFitsHostPorts :检查pod.spec.containers.ports.hostPort属性(绑定节点上的某个端口)是否定义,并且检查节点中的节点端口是否冲突
- MatchNodeSelector:pods.spec.nodeSelector,检查节点选择器
- PodFitsResources:检查节点是符合pod的资源需求
- NoDiskConflict:检查Pod依赖的存储卷是否能满足需求,默认不检查
- PodToletatesNodeTaints:pods.spec.tolerations可容忍的污点,检查Pod是否能容忍节点上的污点
- PodToletatesNodeExecuteTaints:后期pod无法容忍节点的污点是否驱离,默认不检查
- CheckNodelabelPresence:检查节点的标签的存在性
- CheckServiceAffinity:根据Pod所属的service,将相同所属的service尽可能放在同一个节点,默认不检查
- CheckVolumeBinging:检查符合的vpc
- NoVolumeZoneConflict:检查存储卷区域
- CheckNodeMemoryPressure:检查节点内存是否存在压力
- CheckNodePODPressure:检查节点PID数量是否存在压力
- CheckNodeDiskPressure:检查节点磁盘IO是否存在压力
- MatchInterPodAffinity:检查节点是否能满足Pod的亲和性
3、优选函数
-
LeastRequested:节点的资源空闲率高的优选
-
MostRequested:与LeastRequested相反,尽可能将一个节点使用完
-
BalanceResourceAllocation:CPU和内存资源被占用相近的优选
-
NodePreferAvoidPods:根据节点注解判断(scheduler.alpha.kubernetes.io/preferAvoidPods),节点倾向性得分高的优选
-
TaintToleration:将Pod对象的spec.tolerations列表项与节点的taints列表项进行匹配度检查,匹配的越多越优先
-
SeletorSpreading:检查标签选择器选择的节点被分布的少的,目的是尽可能将从属于一个标签选择器的节点散开
-
InterPodAffinity:遍历Pod亲和性,匹配亲和性越高越优选
-
NodeAffinity:节点亲和性
-
NodeLabel:根据节点是否拥有特定标签,存在则得分
-
ImageLoclity:根据节点本地拥有Pod所使用的镜像大小来计算得分
二、调度配置清单示例
1、nodeSelector:节点选择器
# kubectl label nodes node01 disktype=ssd
# kubectl label nodes node02 disktype=harddisk
# kubectl get nodes --show-labels
# vim pod-demo-ssd.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo-ssd
namespace: default
labels:
app: myapp
tier: frontend
annotations:
dongfei.tech/created-by: "cluster admin"
spec:
containers:
- name: myapp
image: dongfeimg/myapp:v1
imagePullPolicy: IfNotPresent
nodeSelector: #节点选择器
disktype: ssd #调度到拥有disktype=ssd的标签的node上
2、nodeaffinity:节点亲和性
# kubectl label nodes master zone=Beijing
# kubectl label nodes node01 zone=Shanghai
# kubectl label nodes node02 zone=Guangzhou
# cat pod-demo-nodeaffinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo-nodeaffinity
namespace: default
labels:
app: myapp
tier: frontend
annotations:
dongfei.tech/created-by: "cluster admin"
spec:
containers:
- name: myapp
image: dongfeimg/myapp:v1
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: #硬亲和
nodeSelectorTerms:
- matchExpressions: #选择zone在Beijing或Shanghai
- key: zone
operator: In
values:
- Beijing
- Shanghai
3、podaffinity:pod亲和性
# cat pod-demo-podaffinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-demo-podaffinity-1
namespace: default
labels:
app: myapp
tier: frontend
annotations:
dongfei.tech/created-by: "cluster admin"
spec:
containers:
- name: myapp
image: dongfeimg/myapp:v1
imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Pod
metadata:
name: pod-demo-podaffinity-2
namespace: default
labels:
app: db
tier: backend
annotations:
dongfei.tech/created-by: "cluster admin"
spec:
containers:
- name: busybox
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh","-c","sleep 3600"]
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- {key: app, operator: In, values: ["myapp"]}
topologyKey: kubernetes.io/hostname
4、taints:节点污点
- taints
- effect:容忍度
- NoSchedule:不能容忍则不能调度过来,只影响调度过程,对现存pod无影响
- PreferNoSchedule:不能容忍则尽量不要调度过来
- NoExecute:不能容忍则不能调度过来,既影响调度过程,也影响现存pod
- effect:容忍度
# kubectl taint node node01 node-type=production:NoSchedule #打污点
# kubectl describe node node01 |grep Taints
# cat deploy-damo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: myapp
release: canary
template:
metadata:
labels:
app: myapp
release: canary
spec:
containers:
- name: myapp-container
image: dongfeimg/myapp:v2
ports:
- name: http
containerPort: 80
tolerations:
- key: "node-type"
operator: "Equal" #完全相同
value: "production"
effect: "NoSchedule"