prometheus的坑 - 走看看

zoukankan html css js c++ java

prometheus的坑
prometheus是一个用于监控k8s集群状态的工具．今天在主机上配置这个东西，遇到了一个坑，调查了一段时间才解决，记之．

首先，根据网上的教程，利用helm安装这个东西很方便，只要三条指令（ref:https://itnext.io/kubernetes-monitoring-with-prometheus-in-15-minutes-8e54d1de2e13）
```
$ helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/
```
```
$ helm install coreos/prometheus-operator --name prometheus-operator --namespace monitoring
```
```
$ helm install coreos/kube-prometheus --name kube-prometheus --set global.rbacEnable=true --namespace monitoring
```
但是，监控系统却没有正确的启动．经过一番调查，发现是有两个pod挂了，切到他们的container里面，进一步发现挂掉的container的

log信息是相同的：

再经过一番调查，在prometheus的文档中发现下面这段话：

github.com/coreos/prometheus-operator/vendor/github.com/fsnotify/fsnotify/README.md

How many files can be watched at once?

There are OS-specific limits as to how many watches can be created:
- Linux: /proc/sys/fs/inotify/max_user_watches contains the limit, reaching this limit results in a "no space left on device" error.
- BSD / OSX: sysctl variables "kern.maxfiles" and "kern.maxfilesperproc", reaching these limits results in a "too many open files" error.
原来是要达到了系统所允许的watch文件数目的上限．修改文件/proc/sys/fs/inotify/max_user_watches contains的值，再次部署，成功．
查看全文

相关阅读:
inet_ntoa 的一个小问题
 获取DNS服务器的版本信息
 host_network_interfaces_slow_mode_thresholds
10月8日至11月底考试安排
 腾讯广点通防作弊
 移动广告作弊方式及防范方式
 广告联盟常用的防作弊手续
 移动端点击作弊与激活作弊的现象与预警
 数据科学家最常用的十种算法（我准备拿这个当成学习参考）
项目的命名规范，为以后的程序开发中养成良好的行为习惯

原文地址：https://www.cnblogs.com/elnino/p/9707890.html