Nagios原名为NetSaint,由Ethan Galstad开发并维护至今。NAGIOS是一个缩写形式: "Nagios Ain't Gonna Insist On Sainthood" Sainthood 翻译为圣徒,而"Agios"是"saint"的希腊表示方法。Nagios被开发在Linux下使用,但在Unix下也工作得非常好。
- 主机资源监控(CPU load、disk usage、system logs),也包括Windows主机(使用NSClient++ plugin)
- 可以指定自己编写的Plugin通过网络收集数据来监控任何情况(温度、警告……)
- 可以通过配置Nagios远程执行插件远程执行脚本(这个非常意义,这样就省去了ssh连接远程服务器在执行命令的麻烦)
- 远程监控支持SSH或SSL加通道方式进行监控
- 简单的plugin设计允许用户很容易的开发自己需要的检查服务,支持很多开发语言(shell scripts、C++、Perl、ruby、Python、PHP、C#等)
- 包含很多图形化数据Plugins(Nagiosgraph、Nagiosgrapher、PNP4Nagios等)
- 可并行服务检查
- 能够定义网络主机的层次,允许逐级检查,就是从父主机开始向下检查
- 当服务或主机出现问题时发出通告,可通过email, pager, sms 或任意用户自定义的plugin进行通知(不知道是否能够支持短信包)
- 能够自定义事件处理机制重新激活出问题的服务或主机
- 自动日志循环
- 支持冗余监控
- 包括Web界面可以查看当前网络状态,通知,问题历史,日志文件等
Nagios提供了许多插件,利用这些插件可以方便的监控很多服务状态。安装完成后,在nagios主目录下的/libexec里放有nagios自带的可以使用的所有插件,如,check_disk是检查磁盘空间的插件,check_load是检查CPU负载的,等等。每一个插件可以通过运行./check_xxx –h 来查看其使用方法和功能。比如下述代码就是check_disk的使用方法。

enadmin@ubuntu-server:/usr/local/nagios/libexec$ ./check_disk -h check_disk v2.0 (nagios-plugins 2.0) Copyright (c) 1999 Ethan Galstad <nagios@nagios.org> Copyright (c) 1999-2014 Nagios Plugin Development Team <devel@nagios-plugins.org> This plugin checks the amount of used disk space on a mounted file system and generates an alert if free space is less than one of the threshold values Usage: check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device} [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ] [-t timeout] [-u unit] [-v] [-X type] [-N type] [-n] Options: -h, --help Print detailed help screen -V, --version Print version information --extra-opts=[section][@file] Read options from an ini file. See https://www.nagios-plugins.org/doc/extra-opts.html for usage and examples. -w, --warning=INTEGER Exit with WARNING status if less than INTEGER units of disk are free -w, --warning=PERCENT% Exit with WARNING status if less than PERCENT of disk space is free -c, --critical=INTEGER Exit with CRITICAL status if less than INTEGER units of disk are free -c, --critical=PERCENT% Exit with CRITICAL status if less than PERCENT of disk space is free -W, --iwarning=PERCENT% Exit with WARNING status if less than PERCENT of inode space is free -K, --icritical=PERCENT% Exit with CRITICAL status if less than PERCENT of inode space is free -p, --path=PATH, --partition=PARTITION Mount point or block device as emitted by the mount(8) command (may be repeated) -x, --exclude_device=PATH <STRING> Ignore device (only works if -p unspecified) -C, --clear Clear thresholds -E, --exact-match For paths or partitions specified with -p, only check for exact paths -e, --errors-only Display only devices/mountpoints with errors -f, --freespace-ignore-reserved Don't account root-reserved blocks into freespace in perfdata -g, --group=NAME Group paths. Thresholds apply to (free-)space of all partitions together -k, --kilobytes Same as '--units kB' -l, --local Only check local filesystems -L, --stat-remote-fs Only check local filesystems against thresholds. Yet call stat on remote filesystems to test if they are accessible (e.g. to detect Stale NFS Handles) -M, --mountpoint Display the mountpoint instead of the partition -m, --megabytes Same as '--units MB' -A, --all Explicitly select all paths. This is equivalent to -R '.*' -R, --eregi-path=PATH, --eregi-partition=PARTITION Case insensitive regular expression for path/partition (may be repeated) -r, --ereg-path=PATH, --ereg-partition=PARTITION Regular expression for path or partition (may be repeated) -I, --ignore-eregi-path=PATH, --ignore-eregi-partition=PARTITION Regular expression to ignore selected path/partition (case insensitive) (may be repeated) -i, --ignore-ereg-path=PATH, --ignore-ereg-partition=PARTITION Regular expression to ignore selected path or partition (may be repeated) -t, --timeout=INTEGER Seconds before plugin times out (default: 10) -u, --units=STRING Choose bytes, kB, MB, GB, TB (default: MB) -v, --verbose Show details for command-line debugging (Nagios may truncate output) -X, --exclude-type=TYPE Ignore all filesystems of indicated type (may be repeated) -N, --include-type=TYPE Check only filesystems of indicated type (may be repeated) -n, --newlines Multi-line output of each disk's status information on a new line Examples: check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p / Checks /tmp and /var at 10% and 5%, and / at 100MB and 50MB check_disk -w 100 -c 50 -C -w 1000 -c 500 -g sidDATA -r '^/oracle/SID/data.*$' Checks all filesystems not matching -r at 100M and 50M. The fs matching the -r regex are grouped which means the freespace thresholds are applied to all disks together check_disk -w 100 -c 50 -C -w 1000 -c 500 -p /foo -C -w 5% -c 3% -p /bar Checks /foo for 1000M/500M and /bar for 5/3%. All remaining volumes use 100M/50M Send email to help@nagios-plugins.org if you have questions regarding use of this software. To submit patches or suggest improvements, send email to devel@nagios-plugins.org
Nagios可以识别4种状态返回信息,即 0(OK)表示状态正常/绿色、1(WARNING)表示出现警告/黄色、2(CRITICAL)表示出现非常严重的错误/红色、3(UNKNOWN)表示未知错误/深黄色。Nagios根据插件返回来的值,来判断监控对象的状态,并通过web显示出来,以供管理员及时发现故障。
再说报警功能,如果监控系统发现问题不能报警那就没有意义了,所以报警也是nagios很重要的功能之一。但是,同样的,Nagios 自身也没有报警部分的代码,甚至没有插件,而是交给用户或者其他相关开源项目组去完成的。(我使用的是ubuntu下msmtp+mutt的安装和配置这个方法)
Nagios 安装,是指基本平台,也就是Nagios软件包的安装。它是监控体系的框架,也是所有监控的基础。
知道Nagios 是如何通过插件来管理服务器对象后,现在开始研究它是如何管理远端服务器对象的。Nagios 系统提供了一个插件NRPE。Nagios 通过周期性的运行它来获得远端服务器的各种状态信息。它们之间的关系如下图所示:
Nagios 通过NRPE 来远端管理服务
1. Nagios 执行安装在它里面的check_nrpe 插件,并告诉check_nrpe 去检测哪些服务。
2. 通过SSL,check_nrpe 连接远端机子上的NRPE daemon
3. NRPE 运行本地的各种插件去检测本地的服务和状态(check_disk,..etc)
4. 最后,NRPE 把检测的结果传给主机端的check_nrpe,check_nrpe 再把结果送到Nagios状态队列中。
5. Nagios 依次读取队列中的信息,再把结果显示出来。
创建用户 # useradd -s /sbin/nologin nagios 修改nagios密码 # sudo passwd nagios 密码设置为nagios
# mkdir /usr/local/nagios # ls -al --查看目录权限 # chown -R nagios.nagios /usr/local/nagios # ls -al --再次查看目录权限 # mkdir /home/nagios # chown -R nagios.nagios /home/nagios # su nagios # echo "test" |mutt -s "my_first_test" aaa@126.com --测试mutt发送邮件
tar -zxf nagios-4.0.4.tar.gz
cd nagios-4.0.4/
enadmin@cgnmon:~/software/nagios-4.0.4$ ./configure --prefix=/usr/local/nagio
./configure --prefix 作用
不指定prefix,则可执行文件默认放在/usr/local/bin,库文件默认放在/usr/local/lib,配置文件默认放在/usr/local/etc。其它的资源文件放在/usr/local/share。你要卸载这个程序,要么在原来的make目录下用一次make uninstall(前提是make文件指定过uninstall),要么去上述目录里面把相关的文件一个个手工删掉。
make all make install && make install-init && make install-commandmode && make install-config
4. 安装Nagios 插件
# wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz # tar zxvf nagios-plugins-1.4.16.tar.gz # cd nagios-plugins-1.4.16 # ./configure --prefix=/usr/local/nagios # make && make install
sudo ln -s /usr/lib/insserv/insserv /sbin/insserv root@ubuntu-server:/sbin# ll /sbin/insserv lrwxrwxrwx 1 root root 24 Mar 28 09:21 /sbin/insserv -> /usr/lib/insserv/insserv*
7.1Nagios 启动时告警:
Starting nagios:No directory, logging in with HOME=/
* Restarting web server apache2
apache2: Could not reliably determine the server's fully qualified domain name, using for ServerName
... waiting apache2: Could not reliably determine the server's fully qualified domain name, using for ServerName
修改 httpd.conf 文件
sudo vim /etc/apache2/httpd.conf
ServerName localhost