- 亲和性:应用A与应用B两个应用频繁交互,所以有必要利用亲和性让两个应用的尽可能的靠近,甚至在一个node上,以减少因网络通信而带来的性能损耗。
- 反亲和性:当应用的采用多副本部署时,有必要采用反亲和性让各个应用实例打散分布在各个node上,以提高HA
Node
node亲和性可以约束调度器基于node labels调度pod
考虑以下场景:
有az1,az2两个zone,现在我们只希望pod实例部署在az1
apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - az1 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value containers: - name: with-node-affinity image: k8s.gcr.io/pause:2.0
两种类型:
- requiredDuringSchedulingIgnoredDuringExecution:hard,严格执行,满足规则调度,否则不调度,在预选阶段执行,所以违反hard约定一定不会调度到
- preferredDuringSchedulingIgnoredDuringExecution:soft,尽力执行,优先满足规则调度,在优选阶段执行,
后缀IgnoredDuringExecution表示如果labels发生改变,使得原本运行的pod不在满足规则,那么这个pod将忽视这个改变,继续运行。
- requiredDuringSchedulingRequiredDuringExecution:未实现,与之前类似,只是后缀不同,代表如果labels发生改变,kubelet将驱逐不满足规则的pod
Note: 支持的operator操作: In, NotIn, Exists, DoesNotExist, Gt, Lt. 其中,NotIn 和 DoesNotExist用于实现反亲和性。
Note: weight范围1-100。这个涉及调度器的优选打分过程,每个node的评分都会加上这个weight,最后bind最高的node。
限制
- 同时指定nodeSelector and nodeAffinity,pod必须都满足
- nodeAffinity有多个nodeSelectorTerms ,pod只需满足一个
- nodeSelectorTerms多个matchExpressions ,pod必须都满足
- 由于IgnoredDuringExecution,所以改变labels不会影响已经运行pod
总的来说,node亲和性与nodeSelector类似,是它的扩展。
Inter-pod
在K8S中,我们可以根据node上已运行pod的标签来决定将pod调度到哪个node。
例如:pod是否(亲和性:是,反亲和性:否)可以调度在X上;此时在X上,已经运行了一些pods;调度器需要考虑这些pods是否满足规则Y。
- 规则Y就是LabelSelector,
- X是一个逻辑拓扑概念,可以是node,rack,az,region等等;用topologyKey表示,具体值用node label表示。
kubernetes.io/hostname failure-domain.beta.kubernetes.io/zone failure-domain.beta.kubernetes.io/region beta.kubernetes.io/instance-type beta.kubernetes.io/os beta.kubernetes.io/arch
Note: 此特性有明显的性能损耗,需要大量运算。
apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: failure-domain.beta.kubernetes.io/zone
Note: 合法的operator包括:In, NotIn, Exists, DoesNotExist.
限制
topologyKey:
1、对于亲和性和软反亲和性,不允许空topologyKey;
2、对于硬反亲和性,LimitPodHardAntiAffinityTopology控制器用于限制topologyKey只能是kubernetes.io/hostname;
3、对于软反亲和性,空topologyKey被解读成kubernetes.io/hostname, failure-domain.beta.kubernetes.io/zone and failure-domain.beta.kubernetes.io/region的组合;
例如
apiVersion: apps/v1 kind: Deployment metadata: name: redis-cache spec: selector: matchLabels: app: store replicas: 3 template: metadata: labels: app: store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: redis-server image: redis:3.2-alpine
部署3个redis实例,并且为了提升HA,都不在一个node。
apiVersion: apps/v1 kind: Deployment metadata: name: web-server spec: selector: matchLabels: app: web-store replicas: 3 template: metadata: labels: app: web-store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-store topologyKey: "kubernetes.io/hostname" podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: web-app image: nginx:1.12-alpine
部署三个web实例,为了提升HA,都不在一个node;并且为了方便与redis交互,尽量与redis在同一个node。