(注:以下主要包括nagios安装,nagois配置,nagios对redis监控,nagios对mysql监控,nagios对zookeeper监控)
Nagios不但能够实现对系统CPU,磁盘、网络等方面参数的基本系统监测,而且还能够监测包括SMTP,POP3,HTTP,NNTP等各种基本的服务类型。另外通过一些插件的安装和监测脚本自定义用户可以针对自己的应用程序实现监测,并针对大量的监测主机和多个对象部署层次化的监测架构。
一、nagios安装
Nagios主节点需要安装:
-
nagios
-
nagios-plugin
-
nrpe
-
php
-
apache
Nagios从节点需要安装:
-
nagios-plugin
-
nrpe
NRPE说明:
-
NRPE外部构件监测远程主机。NRPE外部构件可以在远程的Linux/Unix主机上执行插件程序。如果是要象监测本地主机一样对远程主机的磁盘利用率、CPU负荷和内存占用率等情况下,NRPE外部构件将非常有用。
-
提到“外部构件”这个概念的时候需要说明一下,Nagios有许多"外部构件"软件包可供使用。外部构件可以扩展Nagios的应用并使之与其他软件集成,而且能够通过WEB接口来实现管理配置文件,监测远程主机(*NIX,Windows等),对远程主机的强制监测,减化并扩展告警逻辑等功能。
-
NRPE是一个可在远程Linux/Unix主机上执行的插件的外部构件包。如果你需要监测远程的主机上的本地资源或属性,如磁盘利用率、CPU负荷、内存利用率等时是很有用的。最终效果和用check_by_ssh插件来实现的功能一样,但是他不需要占用更多的监测主机的CPU负荷,所以当你需要监测大量的主机时这个构件将起到很重要的作用(如图pic35.png所示)。
-
通过该图可以看出,我们需要在被监测主机上部署NRPE,他相当于一个守护进程负责监听。而监测主机使用check_nrpe并通过SSL连接访问这个daemon,然后调用被监测方的check_disk,check_load等脚本获取信息并将结果传递到监测主机。同时这些脚本也有能力监测到其他主机的相关信息。
主机安装环境检查(全部节点)
1
2
3
4
5
6
7
8
|
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel gcc-4.4.7-3.el6.x86_64 glibc-2.14.1-6.x86_64 glibc-common-2.14.1-6.x86_64 gd-2.0.35-11.el6.x86_64 package gd-devel is not installed package xinetd is not installed openssl-devel-1.0.0-27.el6.x86_64 |
若有缺失,请先安装. 可通过如下几个镜像网站下载相关安装包:
-
http://rpm.pbone.net/
-
http://mirrors.163.com/centos/6.4/os/x86_64/Packages/
-
http://mirrors.sohu.com/centos/6.4/os/x86_64/Packages/
创建nagios用户
useradd
nagios -d
/usr/local/nagios
passwd
nagios (密码自定义)
主节点安装
一、nagios(下载:http://jaist.dl.sourceforge.net/project/nagios/nagios-4.x/nagios-4.0.2/nagios-4.0.2.tar.gz)
1、安装
2、将nagios添加为服务
二、nagios插件(下载https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz)
tar
-zxf nagios-plugins-1.5.
tar
.gz
cd
nagios-plugins-1.5
.
/configure
--prefix=
/usr/local/nagios
--with-nagios-user=nagios --with-nagios-group=nagios
make
&&
make
install
如果出现mysql相关的编译错误,是mysql的默认安装路径被修改导致的,调整with-mysql后重新make
.
/configure
--prefix=
/usr/local/nagios
--with-mysql=
/usr/local/mysql
make
&&
make
install
三、NRPE(下载http://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.15/nrpe-2.15.tar.gz)
tar
-zxf nrpe-2.15.
tar
.gz
cd
nrpe-2.15
.
/configure
--
enable
-
command
-args
make
all
make
install
-plugin
被监控节点需要执行 make
install
-daemon &&
make
install
-daemon-config &&
make
install
-xinetd
四、Apache(下载http://archive.apache.org/dist/httpd/httpd-2.2.23.tar.gz)
tar
-zxf httpd-2.2.23.
tar
.gz
cd
httpd-2.2.23
.
/configure
--prefix=
/usr/local/apache2
make
&&
make
install
五、PHP(下载http://cn2.php.net/distributions/php-5.4.10.tar.gz)
cd
/export/home/tools/soft/php
tar
-zxf php-5.4.10.
tar
.gz
cd
/php-5
.4.10
.
/configure
--prefix=
/usr/local/php
--with-apxs2=
/usr/local/apache2/bin/apxs
make
&&
make
install
从节点安装
从借点安装上面二、三两部分就可以
二、Nagios配置
一、被监控节点配置(主从联系配置):
1、更改/etc/xinetd.d/nrpe文件,设置允许nagios主节点服务器连接
vi
/etc/xinetd
.d
/nrpe
only_from = 127.0.0.1 主节点IP
2、在/etc/services结尾增加:
nrpe 5666/tcp # NRPE
3、增加对参数的支持
vi
/usr/local/nagios/etc/nrpe
.cfg
dont_blame_nrpe=1
4、启动xinetd
service xinetd restart
5、验证nrpe是否监听
netstat
-at |
grep
nrpe
6、测试nrpe是否正常运行
/usr/local/nagios/libexec/check_nrpe
-H localhost
NRPE v2.15
7、主节点测试
/usr/local/nagios/libexec/check_nrpe
-H 配置从节点的IP,返回版本信息表示成功
二、被监控节点命令配置:
1、修改配置文件
# su - nagios
$ vi /usr/local/nagios/etc/nrpe.cfg
修改为:
command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
command[check_procs_args]=/usr/local/nagios/libexec/check_procs $ARG1$
command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$
-
check_users 监控登陆用户数
-
check_load 监控CPU负载
-
check_disk 监控磁盘的使用
-
check_procs 监控进程数量,状态包括 RSZDT
-
check_swap 监控SWAP分区使用
2、检查监控命令配置是否ok
service xinetd restart
/usr/local/nagios/libexec/check_nrpe
-H localhost -c check_users -a 5 10
/usr/local/nagios/libexec/check_nrpe
-H localhost -c check_load -a 15,10,5 30,25,20
/usr/local/nagios/libexec/check_nrpe
-H localhost -c check_disk -a 20% 10% /
/usr/local/nagios/libexec/check_nrpe
-H localhost -c check_procs -a 200 400 RSZDT
/usr/local/nagios/libexec/check_nrpe
-H localhost -c check_swap -a 20% 10%
三、主节点配置(主从联系配置):
1、定义权限
(使用 nagios 用户)
vi /usr/local/nagios/etc/cgi.cfg
修改如下内容,为admin用户增加权限:
1
2
3
4
5
6
7
8
|
default_user_name=admin authorized_for_system_information=nagiosadmin,admin authorized_for_configuration_information=nagiosadmin,admin authorized_for_system_commands=nagiosadmin,admin authorized_for_all_services=nagiosadmin,admin authorized_for_all_hosts=nagiosadmin,admin authorized_for_all_service_commands=nagiosadmin,admin authorized_for_all_host_commands=nagiosadmin,admin |
2、nagios.cfg
vi /usr/local/nagios/etc/nagios.cfg
1
2
|
#cfg_file=/export/home/nagios/etc/objects/localhost.cfg (注释掉) cfg_dir= /export/home/nagios/etc/servers |
主配置文件声明了监控脚本的存储路径为 ./servers, 默认没有此目录,需要手工创建
nagios 会读取 servers 目录下面后缀为.cfg的全部文件作为配置文件
1
2
3
|
cd /usr/local/nagios/etc mkdir servers cd servers |
3、定义监控组
声明一个监控的主机组,将主机环境中提到的三台主机全部加入监控
vi /export/home/nagios/etc/servers/group.cfg
新文件,内容如下:
1
2
3
4
5
|
define hostgroup{ hostgroup_name name alias name members name1,name2,name3 } |
解释下上面的配置:
-
hostgroup_name: 主机组的名称,可随意指定
-
alias: 主机组别名,可随意指定
-
members: 主机组成员,多个主机名称之前使用逗号分隔.另外主机名称必须与 define host 中host_name 一致.
4、定义监控主机
先定义本地主机 主机-1
vi /export/home/nagios/etc/servers/主机-1.cfg
define host{
use linux-server
host_name 主机-1
alias 主机-1
address 192.168.56.10
}
define service{
use local-service
host_name 主机-1
service_description Host Alive
check_command check-host-alive
}
define service{
use local-service
host_name 主机-1
service_description Users
check_command check_local_users!20!50
}
由于是此主机也是监控服务主节点所在主机,因此可以使用check_local_* 的相关命令来进行监控.
这个文件中已经将常用的监控项配置进去.
再定义远程主机主机2和主机-3
定义远程主机的监控之前,需要先定义check_nrpe命令
vi /usr/local/nagios/etc/objects/commands.cfg
在文件的最后面添加如下内容:
1
2
3
4
5
6
7
8
9
|
# 'check_nrpe' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ } define command{ command_name check_nrpe_args command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$ } |
下面的配置文件定义同上
5、定义邮件收件人
定义监控人邮件地址
vi /usr/local/nagios/etc/objects/contacts.cfg
1
2
3
4
5
6
7
|
define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined above) alias Nagios Admin ; Full name of user email yourname@domain.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** } |
除了配置监控邮件的接收人外,还要确保:
-
本主机与邮件服务器互通
-
本主机SendMail可以使用外部SMTP服务发送邮件
三、对redis的监控
首先安装:yum info perl5 yum install perl-Time-HiRes
1、下载check_redis.pl插件,放入libexec
2、etc/objects/commands.cfg加入:
# check redis
define command {
command_name check_redis
command_line $USER1$/check_redis.pl -H $HOSTADDRESS$ -p $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -f
}
3、监听配置文件加入
define service {
use local-service
service_description 描述名称
check_command 命令(如下)
host_name 主机名/IP
}
check_redis!端口!'监听内容(逗号隔开)'!(报警阀值)!(报警阀值) ;
监听内容参数翻译如下:
--total_connections_received=WARN:threshold,CRIT:threshold,<other specifiers>
Total Connections Received 收到总连接数
--total_connections_received_rate=WARN:threshold,CRIT:threshold,<other specifiers>
Rate of Change of Total Connections Received 总共收到的连接率
--total_expires=WARN:threshold,CRIT:threshold,<other specifiers>
Number of Expired Keys for All DBs dbs总过期密钥
--used_memory_rss=WARN:threshold,CRIT:threshold,<other specifiers>
Resident Set Size, Used Memory in Bytes
--used_cpu_sys=WARN:threshold,CRIT:threshold,<other specifiers>
Main Process Used System CPU CPU使用率
--redis_git_dirty=WARN:threshold,CRIT:threshold,<other specifiers>
Git Dirty Set Bit 脏数据
--connected_clients=WARN:threshold,CRIT:threshold,<other specifiers>
Total Number of Connected Clients 总连接数
--uptime_in_days=WARN:threshold,CRIT:threshold,<other specifiers>
Total Uptime in Days 总运行天数
--uptime_in_days_rate=WARN:threshold,CRIT:threshold,<other specifiers>
Rate of Change of Total Uptime in Days 总运行时间的变化率
--keyspace_hits=WARN:threshold,CRIT:threshold,<other specifiers>
Total Keyspace Hits
--keyspace_hits_rate=WARN:threshold,CRIT:threshold,<other specifiers>
Rate of Change of Total Keyspace Hits
--pubsub_channels=WARN:threshold,CRIT:threshold,<other specifiers>
Number of Pubsub Channels Pubsub通道数量
--used_cpu_user_children=WARN:threshold,CRIT:threshold,<other specifiers>
Child Processes Used User CPU 子进程用户CPU使用
--keyspace_misses=WARN:threshold,CRIT:threshold,<other specifiers>
Keyspace Misses
--keyspace_misses_rate=WARN:threshold,CRIT:threshold,<other specifiers>
Rate of Change of Keyspace Misses
--used_cpu_user=WARN:threshold,CRIT:threshold,<other specifiers>
Main Process Used User CPU
--total_commands_processed=WARN:threshold,CRIT:threshold,<other specifiers>
Total Number of Commands Processed from Start 从开始处理的命令总数量
--total_commands_processed_rate=WARN:threshold,CRIT:threshold,<other specifiers>
Rate of Change of Total Number of Commands Processed from Start
--mem_fragmentation_ratio=WARN:threshold,CRIT:threshold,<other specifiers>
Memory Fragmentation Ratio 记忆碎片比率
--blocked_clients=WARN:threshold,CRIT:threshold,<other specifiers>
Number of Currently Blocked Clients 目前阻止客户的数量
--evicted_keys=WARN:threshold,CRIT:threshold,<other specifiers>
Total Number of Evicted Keys 驱逐总数
--evicted_keys_rate=WARN:threshold,CRIT:threshold,<other specifiers>
Rate of Change of Total Number of Evicted Keys驱逐率
--total_keys=WARN:threshold,CRIT:threshold,<other specifiers>
Total Number of Keys on the Server
--expired_keys=WARN:threshold,CRIT:threshold,<other specifiers>
Total Number of Expired Keys
--expired_keys_rate=WARN:threshold,CRIT:threshold,<other specifiers>
Rate of Change of Total Number of Expired Keys
--connected_slaves=WARN:threshold,CRIT:threshold,<other specifiers>
Number of Connected Slaves
--used_cpu_sys_children=WARN:threshold,CRIT:threshold,<other specifiers>
Child Processed Used System CPU
四、对mysql的监控
三个插件:check_mysql/check_mysqld.pl/check_mysql_health,check_mysql_health比较完善,选取check_mysql_health;
check_mysql_health用法:
下载地址 https://labs.consol.de/nagios/check_mysql_health/
使用前提安装:yum -y install perl-DBD-MySQL
1、下载check_mysql_health-2.1.tar.gz
2、解压tar -zxvf check_mysql_health-2.1.tar.gz
3、安装
#./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with- perl=/usr/bin/perl
#make && make install
4、命令测试:
./check_mysql_health --hostname 192.168.0.1 --port 3306 --username myname --password mypassword --mode threads-connected --warning 700 --critical 1000
5、etc/objects/commands.cfg添加:
# check mysql health
define command {
command_name check_mysql_health
command_line $USER1$/check_mysql_health --hostname $ARG1$ --port $ARG2$ --username $ARG3$ --password $ARG4$ --mode $ARG5$ --warning $ARG6$ --critical $ARG7$
}
6、监控配置文件配置(同上)
监控参数:
connection-time (Time to connect to the server)
uptime (Time the server is running)
threads-connected (Number of currently open connections)线程数
threadcache-hitrate (Hit rate of the thread-cache)慢查询
slave-lag (Seconds behind master)
slave-io-running (Slave io running: Yes)主从热备
slave-sql-running (Slave sql running: Yes)主从热备
qcache-hitrate (Query cache hitrate)
qcache-lowmem-prunes (Query cache entries pruned because of low memory)
keycache-hitrate (MyISAM key cache hitrate)
bufferpool-hitrate (InnoDB buffer pool hitrate)
bufferpool-wait-free (InnoDB buffer pool waits for clean page available)
log-waits (InnoDB log waits because of a too small log buffer)
tablecache-hitrate (Table cache hitrate)
table-lock-contention (Table lock contention)锁表率
index-usage (Usage of indices)
tmp-disk-tables (Percent of temp tables created on disk)
slow-queries (Slow queries)
long-running-procs (long running processes)
cluster-ndbd-running (ndnd nodes are up and running)
sql (any sql command returning a single number)
7、/etc/init.d/nagios restart 重启nagios,若报进程被锁 则需要删除/var/lock/subsys/nagios
五、对zookeeper的监控
一、安装插件
git clone https://github.com/harisekhon/nagios-plugins
cd nagios-plugins
make
二、插件说明
1、etc/objects/commands.cfg添加:
# check zk
define command {
command_name check_zk
command_line /exeport/home/nagios/nagios_plugins/check_zookeeper.pl -H $ARG1$
}
2、service中配置监控信息
注:若出现权限不够,需要修改权限为可执行