1.Nagios的安装
1)安装编译所需的软件包;
[root@nagios ~]#yum –y install httpd php-* gd-* mysql-devel [root@nagios ~]#setenforce 0 #关闭selinux [root@nagios ~]#sed 's/=enforcing/=permissive/' /etc/sysconfig/selinux
2) 创建运行nagios服务的用户;
[root@nagios ~]#useradd nagios #创建运行nagios服务的用户 [root@nagios ~]#usermod –G nagios apache #使apache用户对nagios目录具有写权限,不然web页面操作失败
3) Nagios软件安装;
[root@nagios ~]#tar zxf nagios-cn-3.2.3.tar.gz #释放nagios源码包 [root@nagios ~]#cd nagios-cn-3.2.3 Ps: 若在RHEL6X32位系统中安装nagios-cn-3.2.3.tar.bz2要先执行make clean操作,然后再执行./configure和make all等操作,否则执行make all时会报错! [root@nagios nagios-cn-3.2.3]#./configure --enable-embedded-perl #编译nagios [root@nagios nagios-cn-3.2.3]#make all [root@nagios nagios-cn-3.2.3]#make install #安装主程序,CGI和HTML文件 [root@nagios nagios-cn-3.2.3]#make install-init #在/etc/rc.d/init.d安装启动脚本 [root@nagios nagios-cn-3.2.3]#make install-commandmode #配置目录权限 [root@nagios nagios-cn-3.2.3]#make install-config #安装示例配置文件 [root@nagios nagios-cn-3.2.3]#make install-webconf #安装nagios的web接口,会在/etc/httpd/conf.d目录中创建nagios.conf文件
4)安装Nagios-plugins插件;
[root@nagios ~]#tar zxf nagios-plugins-1.4.15.tar.gz [root@nagios ~]#cd nagios-plugins-1.4.15 [root@nagios nagios-plugins-1.4.15]#./configure –with-nagios-user=nagios --with-nagios-group=nagios –enable-extra-opts --enable-libtap --enable-perl-modules [root@nagios nagios-plugins-1.4.15]#make && make install (Ps:会在”/usr/local/nagios/libexec”目录下多出一些文件,这里存放nagios所有插件)
5) 修改nagios的主配置文件nagios.cfg;
[root@nagios ~]#vim /usr/local/nagios/etc/nagios.cfg 新建 cfg_file=/usr/local/nagios/etc/objects/hosts.cfg #存放主机与主机组定义 cfg_file=/usr/local/nagios/etc/objects/services.cfg #存放服务与服务组定义 修改 #cfg_file=/usr/local/nagios/etc/objects/localhost.cfg #加”#”注释,在36行
6) 创建hosts.cfg文件;
[root@nagios ~]#vim /usr/local/nagios/etc/objects/hosts.cfg define host{ use linux-server #定义使用的模版 host_name nagios #被监控主机名称 alias nagios #别名 address 127.0.0.1 #被监控主机的IP地址 icon_image web.gif statusmap_image web.gd2 2d_coords 100,300 3d_coords 100,300,100 check_command check-host-alive #监控命令,来自commands.cfg文件 max_check_attempts 5 #检查失败后重试的次数 check_period 24x7 #查看的时间段,来自timeperiods.cfg定义 contact_groups admins #联系人组,在contactgroups.cfg中定义的 notification_interval 10 #提醒的间隔,每隔10分钟提醒一次 notification_period 24x7 #提醒的周期,24x7,来自timeperiods.cfg定义 notification_options d,u,r #指定什么情况下提醒 } define hostgroup{ hostgroup_name linux-servers alias linux server members * }
7) 创建services.cfg文件;
[root@nagios ~]#vim /usr/local/nagios/etc/objects/services.cfg define service { use local-service host_name nagios service_groups systemcheck service_description //主机存活 check_command check-host-alive } define service { use local-service host_name nagios service_groups systemcheck service_description //登录用户数 check_command check_local_users!20!50 //#监测远程主机当前的登录用户数量,如果大于20用户则报warning,如果大于50则报critical } define service { use local-service host_name nagios service_groups systemcheck service_description //根分区使用率 check_command check_local_disk!20%!10%!/ //#如果可用空间低于20%会报Warning,如果可用空间低于10%则报Critical } define service { use local-service host_name nagios service_groups systemcheck service_description //进程总数 check_command check_local_procs!250!400!RSZDT //#监测远程主机当前的进程总数,如果大于250进程则报warning,如果大于400进程则报critical,S(休眠)、R(运行)、Z(僵死)、D (不可中断)、T (停止) } define service { use local-service host_name nagios service_groups systemcheck service_description CPU负载 check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 //#当1分钟多于5个进程等待,5分钟多于4个,15分钟多于3个则为warning状态 //#当1分钟多于10个进程等待,5分钟多于6个,15分钟多于4个则为critical状态 } define service { use local-service host_name nagios service_groups systemcheck service_description //交换空间利用率 check_command check_local_swap!20%!10% //#如果交换空间低于20%会报Warning,如果可用空间低于10%则报Critical } define servicegroup { servicegroup_name systemcheck alias systemcheck } [root@nagios ~]#/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg #校验nagios配置文件的正确性 [root@nagios ~]#htpasswd –c /usr/local/nagios/etc/htpasswd.users nagiosadmin #添加一个访问nagios页面的授权用户,默认用户是nagiosadmin,创建其他用户修改/usr/local/nagios/etc/cgi.cfg文件: 方法一:修改use_authentication=0 值为0 (在78行) 方法二:authorized_for_system_information=nagiosadmin authorized_for_configuration_information=nagiosadmin authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin authorized_for_all_hosts=nagiosadmin authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin (用 :%s/nagiosadmin/新用户名 命令替换所有nagiosadmin字符)
8)启动httpd和nagios服务并设置开机自动启动
[root@nagios ~]#service iptables stop [root@nagios ~]#service nagios start [root@nagios ~]#service httpd start [root@nagios ~]#chkconfig httpd on [root@nagios ~]#chkconfig nagios on [root@nagios ~]#chkconfig iptables off (Ps:如果开启了selinux需要配置如下两步: chcon -R –t httpd_sys_content_t /usr/local/nagios/sbin/ chcon -R –t httpd_sys-content_t /usr/local/nagios/share/ )
2.被监控端安装
1)安装并启动mysql的服务
[root@mysql ~]#yum –y install mysql-server [root@mysql ~]#service mysqld start [root@mysql ~]#service iptables stop [root@mysql ~]#chkconfig mysqld on [root@mysql ~]#chkconfig iptables off
2)在mysql服务器上创建监控检测帐户
[root@mysql ~]#mysql mysql> create database nagdb; mysql> grant select on nagdb.* to nagdb@’监控主机IP’; mysql> flush privileges; mysql>exit
3)在nagis主机上检测是否可以链接mysql主机上的mysql服务
[root@nagios ~]#/usr/local/nagios/libexec/check_mysql –H 被监控端IP –u nagdb –d nagdb
4)在nagios主机上添加对msyql服务监控的定义
[root@nagios ~]#vim /usr/local/nagios/etc/objects/hosts.cfg define host{ use linux-server host_name mysqlhost alias mysqlserver address 被监控端主机IP icon_image server.gif statusmap_image server.gd2 2d_coords 100,300 3d_coords 100,300,100 check_command check-host-alive max_check_attempts 5 check_period 24x7 contact_group admins notification_interval 10 notification_period 24x7 notification_options d,u,r [root@nagios ~]#vim /usr/local/nagios/etc/objects/services.cfg define service { use local-service host_name mysqlhost service_groups mysqlgroup service_description mysqlservice check_command check_mysql contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,r,c } define servicegroup { servicegroup_name mysqlgroup alias mysqlservices } [root@nagios ~]#vim /usr/local/nagios/etc/objects/commands.cfg define command{ command_name check_mysql command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u nagdb -d nagdb } [root@nagios ~]#/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg #检测无误后重新加载nagios服务 [root@nagios ~]#service nagios reload
3.Nagios通过NRPE监控远程主机系统状况(以mysql主机为例)
1)在被监控端安装nagios-plugins和nrpe
[root@mysql ~]#useradd nagios [root@mysql ~]#tar zxf nagios-plugins-1.4.15.tar.gz [root@mysql ~]#cd nagios-plugins-1.4.15 [root@mysql nagios-plugins-1.4.15]#./configure --with-nagios-user=nagios --with-nagios-group=nagios [root@mysql nagios-plugins-1.4.15]#make && make install [root@mysql nagios-plugins-1.4.15]#cd [root@mysql ~]#yum –y install xinetd [root@mysql ~]#tar zxf nrpe-2.12.tar.gz [root@mysql ~]#cd nrpe-2.12 [root@mysql nrpe-2.12]#./configure [root@mysql nrpe-2.12]#make all [root@mysql nrpe-2.12]#make install-plugin [root@mysql nrpe-2.12]#make install-daemon #安装守护进程 [root@mysql nrpe-2.12]#make install-daemon-config #安装配置文件 [root@mysql nrpe-2.12]#make install-xinetd #安装xinetd脚本
2)配置nrpe,添加nrpe服务
[root@mysql ~]#vim /etc/xinetd.d/nrpe 修改 only_from = 127.0.0.1 监控主机IP #在后面增加监控主机(即nagios服务器)的地址,以空格间隔 [root@mysql ~]#vim /etc/services 添加 nrpe 5666/tcp #nrpe #nrpe服务监听端口 [root@mysql ~]#vim /usr/local/nagios/etc/nrpe.cfg 修改 command[check_disk]=/usr/local/nagios/libexec/check_disk –w 20% -c 10% -p / #在234行,将#注释去掉并修改,’/’表示根分区检测 [root@mysql ~]#service xinetd restart [root@mysql ~]#netstat –at | grep nrpe [root@mysql ~]#netstat –an | grep 5666 #重启xinetd服务,并查看NRPE是否已经启动
3)监控主机的设置
[root@nagios ~]#tar zxf nrpe-2.12.tar.gz [root@nagios ~]#cd nrpe-2.12 [root@nagios nrpe-2.12]#./configure --with-nagios-user=nagios --with-nagios-group=nagios [root@nagios nrpe-2.12]#make all && make install-plugin [root@nagios ~]#/usr/local/nagios/libexec/check_nrpe –H 被监控端IP #如输出NRPE v2.12说明连接正常 [root@nagios ~]#vim /usr/local/nagios/etc/objects/command.cfg define command{ command_name check_nrpe #定义命令名称为check_nrpe,在services.cfg中要使用这个名称 command_line $USER1$/check_nrpe –H $HOSTADDRESS$ -c $ARG1$ #用$USER1$代替/usr/local/nagios/libexec, 后面带的$ARG1$参数是传给nrpe daemon执行的检测命令 } [root@mysql ~]#vim /usr/local/nagios/etc/nrpe.cfg #监控mysql主机的SWAP分区 command[check_swap]=/usr/local/nagios/libexec/check_swap –w 20% -c 10% [root@mysql ~]#service xinetd reload [root@nagios ~]#cd /usr/local/nagios/libexec [root@nagios libexec]#./check_nrpe –H 被监控端主机IP -c check_swap [root@nagios ~]#vim /usr/local/nagios/etc/objects/services.cfg define service { use local-service host_name mysqlhost service_groups mysqlgroup service_description SWAP分区 check_command check_nrpe!check_swap contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,r,c } define service { use local-service host_name mysqlhost service_groups mysqlgroup service_description CPU负载 check_command check_nrpe!check_load contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,r,c } define service { use local-service host_name mysqlhost service_groups mysqlgroup service_description 登录用户数 check_command check_nrpe!check_users contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,r,c } define service { use local-service host_name mysqlhost service_groups mysqlgroup service_description 磁盘剩余空间 check_command check_nrpe!check_disk contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,r,c } define service { use local-service host_name mysqlhost service_groups mysqlgroup service_description 总进程 check_command check_nrpe!check_total_procs contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,r,c } define service { use local-service host_name mysqlhost service_groups mysqlgroup service_description 僵尸进程 check_command check_nrpe!check_zombie_procs contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,r,c } define service{ use generic-service host_name mysqlhost service_description SWAP check_command check_nrpe!check_swap } [root@nagios ~]#/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg [root@nagios ~]#service nagios reload
4.通过邮件报警!
1)配置nagios邮箱报警功能
[root@nagios ~]#vim /usr/local/nagios/etc/object/contacts.cfg define contact{ contact_name nagiosadmin alias Nagios Admin service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email email 1009864@qq.com #多个管理员邮箱地址使用空格或者逗号隔开 }
2)配置邮件服务器(这里以postfix介绍)
[root@nagios ~]#yum –y install postfix* httpd* dovecot* [root@nagios ~]#hostname mail.hello.com [root@nagios ~]#vim /etc/postfix/main.cf 修改 myhostname = mail.hello.com #在75行 mydomain = hello.com #在83行 myorigin = $myhostname #在98行 myorigin = $mydomain #在99行 inet_interfaces = all #在113行 mydestination = $myhostname, $mydomain #在164行 [root@nagios ~]#service sendmail stop [root@nagios ~]#service postfix start [root@nagios ~]#netstat –an | grep 25 [root@nagios ~]#service dovecot restart [root@nagios ~]#postmap /etc/postfix/virtual
另附:自动化监控表锁定问题 http://blog.chinaunix.net/uid-9370128-id-382660.html