环境
安装是使用Vmware虚拟机下进行,操作系统是CentOS7 64位。规划是使用三台虚拟机搭建k8s的集群,网络使用NAT模式。三台的ip分别为:
- k8s-master:192.168.91.132
- k8s-node1:192.168.91.130
- k8s-node2:192.168.91.131
docker的版本是18以上,我启用了ce版本,所以实际的版本号是18.06.3-ce
k8s的版本是v1.15.0
kubernetes基础环境配置
docker本身对于环境没有太大要求,所以下面都是针对k8s的
# 将 SELinux 设置为 permissive 模式(将其禁⽤用)
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# 关闭swap
swapoff -a
cp -p /etc/fstab /etc/fstab.bak$(date '+%Y%m%d%H%M%S')
sed -i "s//dev/mapper/centos-swap/#/dev/mapper/centos-swap/g" /etc/fstab
systemctl daemon-reload
# 关闭防火墙,这是最简单的处理方式,当使用的网络环境是VPC时,内部网络实际上是安全的
systemctl stop firewalld
systemctl disable firewalld
# 设置底层的网络转发参数
echo "net.bridge.bridge-nf-call-ip6tables = 1" >>/etc/sysctl.conf
echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p
modprobe br_netfilter
echo "modprobe br_netfilter" >> /etc/rc.local
sysctl -p
# 设置kubernetes的仓库源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
安装docker和kubernetes
yum install docker-ce kubelet kubeadm kubectl
systemctl enable docker && systemctl restart docker
systemctl enable kubelet && systemctl start kubelet
注意:CentOS默认安装的版本较低(比如我默认安装就是13的版本号),所以当你需要最新版本的时候,还需要卸载重装最新的docker。
2017年的3月1号之后,Docker的版本命名开始发生变化,同时将CE版本和EE版本进行分开。
区别如下:
Docker社区版(CE):为了开发人员或小团队创建基于容器的应用,与团队成员分享和自动化的开发管道。docker-ce提供了简单的安装和快速的安装,以便可以立即开始开发。docker-ce集成和优化,基础设施。(免费)
Docker企业版(EE):专为企业的发展和IT团队建立谁。docker-ee为企业提供最安全的容器平台,以应用为中心的平台。(付费)
# 停止服务
systemctl stop docker
# 删除docker相关
yum erase docker
docker-*
# 添加阿里云的镜像仓库,docker.io的是很慢的
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager
--add-repo
http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 安装启动docker
yum install docker-ce -y
systemctl start docker
systemctl enable docker
下载kubernetes的镜像
kubernetes是Google使用go语言开发的,所以默认是去获取Google提供的镜像。所以我们需要找其他的方式来拉取。
我们写了一个脚本 pull_images.sh,用来从 hub.docker.com 拉取镜像。
#!/bin/bash
gcr_name=k8s.gcr.io
hub_name=mirrorgooglecontainers
# define images
images=(
kubernetes-dashboard-amd64:v1.10.1
kube-apiserver:v1.15.0
kube-controller-manager:v1.15.0
kube-scheduler:v1.15.0
kube-proxy:v1.15.0
pause:3.1
etcd:3.3.10
)
for image in ${images[@]}; do
docker pull $hub_name/$image
docker tag $hub_name/$image $gcr_name/$image
docker rmi $hub_name/$image
done
docker pull coredns/coredns:1.3.1
docker tag coredns/coredns:1.3.1 k8s.gcr.io/coredns:1.3.1
docker rmi coredns/coredns:1.3.1
原理其实也很简单,就是从拉下来之后重新tag,标记为k8s.gcr.io域下。
完成之后如下:
[root@k8s-master ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.15.0 d235b23c3570 4 weeks ago 82.4MB
k8s.gcr.io/kube-apiserver v1.15.0 201c7a840312 4 weeks ago 207MB
k8s.gcr.io/kube-controller-manager v1.15.0 8328bb49b652 4 weeks ago 159MB
k8s.gcr.io/kube-scheduler v1.15.0 2d3813851e87 4 weeks ago 81.1MB
k8s.gcr.io/coredns 1.3.1 eb516548c180 6 months ago 40.3MB
k8s.gcr.io/etcd 3.3.10 2c4adeb21b4f 7 months ago 258MB
k8s.gcr.io/kubernetes-dashboard-amd64 v1.10.0 0dab2435c100 10 months ago 122MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 19 months ago 742kB
初始化k8s-master
到现在实际上k8s安装已经完成,我们要初始化master节点了,直接执行命令
# 由于我们使用虚拟机,基本上是一个cpu,而k8s建议cpu要有2个,所以我们忽略这个错误
kubeadm init --kubernetes-version v1.15.0 --pod-network-cidr 10.244.0.0/16 --ignore-preflight-errors=NumCPU
执行之后要注意下面的信息,提示我们集群搭建的后续操作
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.91.132:6443 --token zf26v4.3u5z3g09ekm4owt3
--discovery-token-ca-cert-hash sha256:fce98cb6779dbcc73408d1faad50c9d8f86f154ed88a5380c08cece5e08aba58
添加node节点
只要在你的node节点执行对应join命令即可
kubeadm join 192.168.91.132:6443 --token zf26v4.3u5z3g09ekm4owt3
--discovery-token-ca-cert-hash sha256:fce98cb6779dbcc73408d1faad50c9d8f86f154ed88a5380c08cece5e08aba58
执行之后在master节点上执行 kubectl get nodes
,可以看到
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 21h v1.15.0
k8s-node1 Ready <none> 127m v1.15.0
启动Kubernetes dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
kubectl proxy
http://127.0.0.1:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
完成
异常汇总
k8s重启后无法启动。
使用journalctl -f
查看日志
-- The start-up result is done.
Jul 19 10:27:34 k8s-node1 kubelet[9831]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 10:27:34 k8s-node1 kubelet[9831]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 10:27:34 k8s-node1 kubelet[9831]: I0130 10:27:34.877299 9831 server.go:407] Version: v1.15.0
Jul 19 10:27:34 k8s-node1 kubelet[9831]: I0130 10:27:34.877538 9831 plugins.go:103] No cloud provider specified.
Jul 19 10:27:34 k8s-node1 kubelet[9831]: I0130 10:27:34.892361 9831 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Jul 19 10:27:34 k8s-node1 kubelet[9831]: I0130 10:27:34.926248 9831 server.go:666] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Jul 19 10:27:34 k8s-node1 kubelet[9831]: F0130 10:27:34.926665 9831 server.go:261] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename Type Size Used Priority /swapfile file 2097148 0 -2]
Jul 19 10:27:34 k8s-node1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Jul 19 10:27:34 k8s-node1 systemd[1]: Unit kubelet.service entered failed state.
Jul 19 10:27:34 k8s-node1 systemd[1]: kubelet.service failed.
刚开始以为是Flag --cgroup-driver has been deprecated这一段,后来才发现应该是failed to run Kubelet: Running with swap on is not supported, please disable swap!这一段,这个日志就很明显了。
解决方案就是关闭Swap,重启后生效
swapoff -a
cp -p /etc/fstab /etc/fstab.bak$(date '+%Y%m%d%H%M%S')
sed -i "s//dev/mapper/centos-swap/#/dev/mapper/centos-swap/g" /etc/fstab
systemctl daemon-reload
systemctl restart kubelet
所以我把这段加到了上面
kubernetes部分pod一直没有正常running。
例如:
[root@k8s-master ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5c98db65d4-b2rgr 1/1 Running 0 20h
kube-system coredns-5c98db65d4-l6x97 1/1 Running 0 20h
kube-system etcd-k8s-master 1/1 Running 4 20h
kube-system kube-apiserver-k8s-master 1/1 Running 15 20h
kube-system kube-controller-manager-k8s-master 1/1 Running 27 20h
kube-system kube-flannel-ds-amd64-k5kjg 1/1 Running 2 110m
kube-system kube-flannel-ds-amd64-z7lcn 1/1 Running 20 88m
kube-system kube-proxy-992ql 1/1 Running 4 20h
kube-system kube-proxy-ss9r6 1/1 Running 0 27m
kube-system kube-scheduler-k8s-master 1/1 Running 29 20h
kube-system kubernetes-dashboard-7d75c474bb-s7fwq 0/1 ErrImagePull 0 102s
最后kubernetes-dashboard-7d75c474bb-s7fwq显示ErrImagePull。
我们执行命令kubectl describe pod kubernetes-dashboard-7d75c474bb-s7fwq -n kube-system
(注意一定要加 -n 指定命名空间,否则会以default空间,会出现 Error from server (NotFound): pods "xxxxxxxx" not found)
最后Events的部分如下:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 119s default-scheduler Successfully assigned kube-system/kubernetes-dashboard-7d75c474bb-s7fwq to k8s-node1
Normal Pulling 50s (x3 over 118s) kubelet, k8s-node1 Pulling image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1"
Warning Failed 33s (x3 over 103s) kubelet, k8s-node1 Failed to pull image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 33s (x3 over 103s) kubelet, k8s-node1 Error: ErrImagePull
Normal BackOff 6s (x4 over 103s) kubelet, k8s-node1 Back-off pulling image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1"
Warning Failed 6s (x4 over 103s) kubelet, k8s-node1 Error: ImagePullBackOff
从这个日志发现问题是kubernetes-dashboard-amd64:v1.10.1要求的版本比目前docker中的版本高(目前的版本是v1.10.0),所以重新拉一个镜像即可。