[转]Prometheus 与 Grafana 实现服务器运行状态监控

Prometheus 是由 SoundCloud 开发的开源监控报警系统和时序列数据库(TSDB)。

Prometheus 使用 Go 语言开发，是 Google BorgMon 监控系统的开源版本。

Prometheus 以及 node_exporter 的安装

$ wget https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.linux-amd64.tar.gz
$ tar xvfz prometheus-*.tar.gz
$ mv prometheus-2.0.0.linux-amd64 prometheus # 为文件夹改名

在 prometheus 目录下有一个名为 prometheus.yml 的主配置文件。查看其内容：

$ cd prometheus

$ vim prometheus.yml

# my global config
global:
 scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
 evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
 alertmanagers:
 - static_configs:
 - targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
 - job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
 static_configs:
 - targets: ['localhost:9090']

其中 scrape_configs 项目中的内容是我们主要关注的，该配置项决定了我们的 Prometheus 需要去抓取哪些数据，默认配置中只有一个 prometheus 的 job ，这个 job 是 Prometheus 自带的，它的功能是提供 Prometheus 进程本身的运行数据以供我们抓取。

除了 prometheus 这个 job 以外，我们还没任何为 Prometheus 提供数据的 HTTP API 程序。于是我们先将 Prometheus 的配置任务放去后面，下一步是在四台服务器上安装 node_exporter

在 4 台需要监控的服务器上下载解压 node_exporter ：https://prometheus.io/download/#node_exporter

$ wget https://github.com/prometheus/node_exporter/releases/download/v0.15.1/node_exporter-0.15.1.linux-amd64.tar.gz
$ tar -zxvf node_exporter-0.15.1.linux-amd64.tar.gz
$ mv node_exporter-0.15.1.linux-amd64 prometheus/node_exporter # 将文件夹重命名后移入prometheus 文件夹
$ ls prometheus
console_libraries consoles data LICENSE node_exporter NOTICE prometheus prometheus.yml promtool

由于 node_exporter 是编译好的程序，我们可以将其配为服务直接运行。

在 4 台需要监控的服务器上配置 node_exporter 服务：

$ sudo vim /etc/systemd/system/node_exporter.service

内容如下：

[Unit]
Description=Node Exporter
 
[Service]
User=admin 
ExecStart=/home/admin/prometheus/node_exporter/node_exporter
 
[Install]
WantedBy=default.target

运行服务装载命令：

$ sudo systemctl daemon-reload
$ sudo systemctl enable node_exporter.service
$ sudo systemctl start node_exporter.service
$ systemctl status node_exporter
● node_exporter.service - Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
Active: active (running) since 二 2017-12-05 17:39:32 CST; 1 day 2h ago
Main PID: 4752 (node_exporter)
CGroup: /system.slice/node_exporter.service
└─4752 /home/admin/prometheus/node_exporter/node_exporter

node_exporter 的 HTTP API 默认端口是 9100 ，为其开放防火墙：

$ sudo firewall-cmd --zone=public --add-port=9100/tcp --permanent ; sudo firewall-cmd --reload
# 测试
$ curl http://localhost:9100
<html>
<head><title>Node Exporter</title></head>
<body>
<h1>Node Exporter</h1>
<p><a href="/metrics">Metrics</a></p>
</body>
</html>

重新编辑 server 机上的 prometheus.yml 配置文件：

# my global config
global:
 scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
 evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
 alertmanagers:
 - static_configs:
 - targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
 - job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
 static_configs:
 - targets: ['localhost:9090']
 
# 添加的部分
 
 - job_name: "node1"
 static_configs:
 - targets: ['111.x.xxx.20:9100']
 
 - job_name: "node2"
 static_configs:
 - targets: ['111.x.xxx.20:9100']
 - job_name: "node3"
 static_configs:
 - targets: ['111.x.xxx.20:9100']
 - job_name: "node4"
 static_configs:
 - targets: ['111.x.xxx.21:9100']

配置 Prometheus 服务并运行：

$ sudo vim /etc/systemd/system/prometheus.service
$ cat vim /etc/systemd/system/prometheus.service
[Unit]
Description=Node Exporter
[Service]
User=root
ExecStart=/home/admin/prometheus/prometheus --config.file=/home/admin/prometheus/prometheus.yml
[Install]
WantedBy=default.target
$ sudo systemctl daemon-reload
$ sudo systemctl enable prometheus.service
$ sudo systemctl start prometheus.service
$ systemctl status prometheus
● prometheus.service - Node Exporter
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
Active: active (running) since 三 2017-12-06 20:43:35 CST; 3s ago
Main PID: 5918 (prometheus)
CGroup: /system.slice/prometheus.service
└─5918 /home/admin/prometheus/prometheus --config.file=/home/admin/prometheus/prometheus.yml
 
# 为 Prometheus 开启防火墙
$ sudo firewall-cmd --zone=public --add-port=9090/tcp --permanent ; sudo firewall-cmd --reload

测试，访问 http://yourserverip:9090/graph ，执行 node_load1 可绘制 CPU 瞬时运行图，也可以通过 http://yourserverip:9090/targets 来查看 job 的部署情况，通过 http://yourserverip:9090/metrics。

Grafana 的安装

用 Prometheus 自带的控制面板略显简陋，一时要手动刷新数据，二是可视化功能不够强大、不支持自定义数据显示，三是执行命令太过繁琐。

于是我们选择 Grafana 来作为可视化工具。

其安装过程（可在任意主机安装）如下：

$ sudo yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.6.2-1.x86_64.rpm
# 为 Grafana 开启防火墙
$ sudo firewall-cmd --zone=public --add-port=9090/tcp --permanent ; sudo firewall-cmd --reload

启动服务：

$ sudo service grafana-server start

启动后可使用 http://yourip:3000 访问 Grafana 页面，配置好用户名和密码后，先设置好数据源（data source），也就是从哪里获取数据。

参考：

可视化的具体配置可参考官方文档，推荐使用 Node Exporter Server Metrics 模板，访问 https://grafana.com/dashboards/405 ，在 Dashboard 中导入该模板即可

[转]Prometheus 与 Grafana 实现服务器运行状态监控

http://flintx.me/2017/12/12/Prometheus%20+%20Grafana%20%E5%AE%9E%E7%8E%B0%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%90%E8%A1%8C%E7%8A%B6%E6%80%81%E7%9B%91%E6%8E%A7/

什么是Prometheus?

Prometheus 以及 node_exporter 的安装

Grafana 的安装