k8s调度器、预选策略、优选函数
节点选择过程
- 节点预选过程(predicate)
- 优选过程(priority)
- 选定节点(select)
调度器
预选策略
-
CheckNodeCondition:检查节点是否正常
-
GeneralPredicates:
- Hostname:检查pod对象是否定义了pod.spec.host
- PodFitsHostPorts:检查pod对象的 pod.spec.containers.ports.hostport
- MatchNodeSelector:检查pod.spec.nodeSelector
- PodFitsResources:检查pod对资源的需求能否被资源满足
-
(默认不启用)NoDiskConflict:检查pod依赖的存储卷 是否能满足需求
-
PodToleratesNodeTains: 检查污点与容忍。pod.spec.tolerations
-
(默认不启用)PodToleratesNodenoExcuteTains:驱离污点
-
(默认不启用)checkNodeLabelPresence:检查标签的存在性
-
(默认不启用)checkServiceAffinity:将 同一个service 下的pod 尽可能放在一个Node下
-
MaxEBSVolumeCount
-
MaxGCEPDVolumeCount
-
MaxAzureDiskVolumeCount
-
CheckVolumeBinding:
-
NoVolumeZoneConflict:
-
CheckNodeMemoryPressure: 检查内存压力
-
CheckNodePIDPressure:检查进程压力
-
CheckNodeDiskPressure
-
MatchInterPodAffitnity: pod间的亲和性
优选函数
https://github.com/kubernetes/kubernetes/tree/master/pkg/scheduler/algorithm/priorities
- LeastRequested:按照资源使用量得分
- BalancedResourceAllocation : CPU和内存资源占用率相近的胜出。平衡资源使用情况
- NodePreferAvoidPods:根据节点的注解信息 "scheduler.aplpha.kubernetes.io/preferAvoidPods" Node 倾向于不
- TainToleration:将pod对象的spec.tolerations 与node的Tain进行匹配度检查,匹配的条目越多,得分越低。
- SelectorSpreading:尽可能的将相同标签选择器的pod 分散在不同的node上。
- InterPodAffinity:亲和性匹配项
- nodeAffinity:节点亲和性
- (默认不启用)MostRequested:服务器空闲度越低,越优先
- (默认不启用)NodeLabel:根据node标签评分
- (默认不启用)imageLocality:节点上是否有需求的镜像,根据镜像的体积大小之和计算
根据预选与优选 影响pod 的节点选择,主要可以通过污点、pod亲和性、node亲和性。
高级调度设置机制
- 节点选择器/节点亲和调度:nodeSelector, nodeName, nodeAffinity
node选择器/node亲和调度
- nod.spec.nodeName : 根据node 名称选择
- nod.spec.nodeSelector:根据node 的标签进行选择
强约束,条件不满足则pedding
- pod.spec.affinity.nodeAffinity
- preferredDuringSchedulingIgnoredDuringExecution 非强制性 ,多条件权重
- requiredDuringSchedulingIgnoredDuringExecution 强制性
pod亲和性
- pod.spec.affinity.podAffinity
- preferredDuringSchedulingIgnoredDuringExecution 非强制性
- requiredDuringSchedulingIgnoredDuringExecution 强制性
- labelSelector
- namespace
- topologykey 必须的 affinity、anti-affinity
污点调度 Taints 与 Tolerations
Taints 给予node定义,那些pod可以执行
**pod 使用 Tolerations指定容忍的污点 **
node.spec.taints
FIELDS:
effect <string> -required-
Required. The effect of the taint on pods that do not tolerate the taint.
Valid effects are NoSchedule, PreferNoSchedule and NoExecute.
key <string> -required-
Required. The taint key to be applied to a node.
timeAdded <string>
TimeAdded represents the time at which the taint was added. It is only
written for NoExecute taints.
value <string>
Required. The taint value corresponding to the taint key.
effect: 用于定义排斥的行为:
- NoSchedule :仅仅影响调度过程,对已经存在的pod不产生影响。
- PreferNoSchedule:最好不调度,但是可容忍。
- NoExecute:既影响调度过程,也影响存在的pod对象。驱逐。
管理节点污点
kubectl taint NODE NAME KEY_1=VAL_1:TAINT_EFFECT_1 ... KEY_N=VAL_N:TAINT_EFFECT_N [options]
pod.spec.tolerations
可以让pod 容忍 node 上的污点。
FIELDS:
effect <string>
Effect indicates the taint effect to match. Empty means match all taint
effects. When specified, allowed values are NoSchedule, PreferNoSchedule
and NoExecute.
key <string>
Key is the taint key that the toleration applies to. Empty means match all
taint keys. If the key is empty, operator must be Exists; this combination
means to match all values and all keys.
operator <string>
Operator represents a key's relationship to the value. Valid operators are
Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for
value, so that a pod can tolerate all taints of a particular category.
tolerationSeconds <integer>
TolerationSeconds represents the period of time the toleration (which must
be of effect NoExecute, otherwise this field is ignored) tolerates the
taint. By default, it is not set, which means tolerate the taint forever
(do not evict). Zero and negative values will be treated as 0 (evict
immediately) by the system.
value <string>
Value is the taint value the toleration matches to. If the operator is
Exists, the value should be empty, otherwise just a regular string.