资源预留必要性
以常见的kubeadm安装的k8s集群来说,默认情况下kubelet没有配置kube-reserverd和system-reserverd资源预留。worker node上的pod负载,理论上可以使用该节点服务器上的所有cpu和内存资源。比如某个deployment controller管理的pod存在bug,运行时无法正常释放内存,那么该worker node上的kubelet进程最终会抢占不到足够的内存,无法向kube-apiserver同步心跳状态,该worker node节点的状态进而被标记为NotReady。随后deployment controller会在另外一个worker节点上创建一个pod副本,又重复前述过程,压垮第二个worker node,最终整个k8s集群将面临“雪崩”危险。
资源分类
Node capacity:节点总的资源
kube-reserved:预留给k8s进程的资源(如kubelet, container runtime, node problem detector等)
system-reserved:预留给操作系统的资源(如sshd、udev等)
eviction-threshold:kubelet eviction的阀值
allocatable:留给pod的可用资源=Node capacity - kube-reserved - system-reserved - eviction-threshold
配置方法
以ubuntu 16.04+k8s v1.14的环境举例,配置步骤如下。
1.修改/var/lib/kubelet/config.yaml
enforceNodeAllocatable: - pods - kube-reserved - system-reserved systemReserved: cpu: "1" memory: "2Gi" kubeReserved: cpu: "1" memory: "2Gi" systemReservedCgroup: /system.slice kubeReservedCgroup: /system.slice/kubelet.service
参数解释:
enforce-node-allocatable=pods,kube-reserved,system-reserved #默认为pod设置,这里要给kube进程和system预留所以要加上。
kube-reserved-cgroup=/system.slice/kubelet.service #k8s组件对应的cgroup目录
system-reserved-cgroup=/system.slice #系统组件对应的cgroup目录
kube-reserved=cpu=1,memory=2Gi #k8s组件资源预留大小
system-reserved=cpu=2,memory=4Gi #系统组件资源预留大小。结合主机配置和系统空载占用资源量的监控,实际测试确定。
注:根据实际需求,也可以对ephemeral-storage做预留,例如“kube-reserved=cpu=1,memory=2Gi, ephemeral-storage=10Gi"
2.修改/lib/systemd/system/kubelet.service
由于cpuset和hugetlb这两个cgroup subsystem默认没有初始化system.slice,需要在启动进程前指定创建。
[Unit] Description=kubelet: The Kubernetes Node Agent Documentation=https://kubernetes.io/docs/home/ [Service] ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/hugetlb/system.slice/kubelet.service ExecStart=/usr/bin/kubelet Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target
3.重启kubelet进程
systemctl restart kubelet
systemctl status kubelet
4.查看worker node的可用资源
kubectl describe node [Your-NodeName]
Capacity: cpu: 40 ephemeral-storage: 197608716Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 131595532Ki pods: 110 Allocatable: cpu: 37 ephemeral-storage: 182116192365 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 125201676Ki pods: 110
参考文档:
https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/