step1:
在官网下载部署文件
https://github.com/kubernetes-retired/heapster/tree/master/deploy/kube-config/influxdb
如果只部署heapster,只需要 kubectl apply -f heapster.yaml
出现如下错误:
E0312 09:39:32.275587 1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:kube-system:heapster" cannot list resource "nodes" in API group "" at the cluster scope E0312 09:39:32.276426 1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:kube-system:heapster" cannot list resource "nodes" in API group "" at the cluster scope I0312 09:39:32.282892 1 heapster.go:112] Starting heapster on port 8082 E0312 09:39:33.264051 1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:kube-system:heapster" cannot list resource "nodes" in API group "" at the cluster scope E0312 09:39:33.281099 1 reflector.go:190] k8s.io/heapster/metrics/heapster.go:328: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-system:heapster" cannot list resource "pods" in API group "" at the cluster scope E0312 09:39:33.281196 1 reflector.go:190] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:89: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:kube-system:heapster" cannot list resource "namespaces" in API group "" at the cluster scope
这里的错误显示是没有权限。所以在仓库里面找到文件 https://github.com/kubernetes-retired/heapster/tree/master/deploy/kube-config/rbac
创建 serviceaccount 权限, kubectl apply -f heapster-rbac.yaml
然后发现还是有如上错误。所以创建一个最高权限的账户,并修改heapster.yaml文件中的serviceaccount
step2:
创建 admin 账户
kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: admin annotations: rbac.authorization.kubernetes.io/autoupdate: "true" roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io subjects: - kind: ServiceAccount name: admin namespace: kube-system --- apiVersion: v1 kind: ServiceAccount metadata: name: admin namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile
先撤掉原来部署的 kubectl delete -f heapster.yaml
修改:serviceAccountName: heapster -> serviceAccountName: admin
再执行: kubectl appy -f heapster.yaml
使用命令查看状态,显示失败。
kubectl top node error: metrics not available yet
再次查看日志
kubectl -n kube-system logs heapster-xxx -f
E0312 09:51:05.007570 1 manager.go:101] Error in scraping containers from kubelet:10.13.145.21:10255: failed to get all container stats from Kubelet URL "http://10.13.145.21:10255/stats/container/": Post http://10.13.145.21:10255/stats/container/: dial tcp 10.13.145.21:10255: getsockopt: connection refused E0312 09:51:05.015293 1 manager.go:101] Error in scraping containers from kubelet:10.13.89.52:10255: failed to get all container stats from Kubelet URL "http://10.13.89.52:10255/stats/container/": Post http://10.13.89.52:10255/stats/container/: dial tcp 10.13.89.52:10255: getsockopt: connection refused E0312 09:51:05.023599 1 manager.go:101] Error in scraping containers from kubelet:10.13.89.53:10255: failed to get all container stats from Kubelet URL "http://10.13.89.53:10255/stats/container/": Post http://10.13.89.53:10255/stats/container/: dial tcp 10.13.89.53:10255: getsockopt: connection refused E0312 09:51:05.029772 1 manager.go:101] Error in scraping containers from kubelet:10.13.89.51:10255: failed to get all container stats from Kubelet URL "http://10.13.89.51:10255/stats/container/": Post http://10.13.89.51:10255/stats/container/: dial tcp 10.13.89.51:10255: getsockopt: connection refused W0312 09:51:25.001639 1 manager.go:152] Failed to get all responses in time (got 0/4)
还是出现错误。在查询之后发现访问链接有问题,解决方案如下
# heapster.yaml文件中的 - --source=kubernetes:https://kubernetes.default # 修改为 - --source=kubernetes:kubernetes:https://kubernetes.default?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true
还是先 kubectl delete -f heapster.yaml
然后再创建 kubectl apply -f heapster.yaml
接着查看日志:
I0312 09:55:05.272254 1 influxdb.go:274] Created database "k8s" on influxDB server at "monitoring-influxdb.kube-system.svc:8086"
然后使用下面命令查看是否成功获取,显示结果表示获取成功
root@n1:~# kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master 347m 8% 2253Mi 58% n1 34m 0% 909Mi 5% n2 28m 0% 917Mi 5% n3 28m 0% 870Mi 5%