zoukankan      html  css  js  c++  java
  • zabbix通过简单命令监控elasticsearch集群状态

    简单命令监控elasticsearch集群状态

    原理:
    使用curl命令模拟访问任意一个es节点可以反馈的集群状态,集群的状态需要为green
    curl -sXGET http://serverip:9200/_cluster/health/?pretty

    {
      "cluster_name" : "yunva-es",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 7,
      "number_of_data_nodes" : 6,
      "active_primary_shards" : 66,
      "active_shards" : 132,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }

    前端使用了nginx验证,需要模拟登陆
    curl模拟用户登录命令格式:
    curl -u username:password -sXGET http://serverip:9200/_cluster/health/?pretty | grep "status"|awk -F '[ "]+' '{print $4}'

    1.修改客户端zabbix配置:
    vim /etc/zabbix/zabbix_agentd.conf

    UserParameter=es_status,curl -u elkadmin:hSeC7ENeirAAPzv047m4 -sXGET http://serverip/_cluster/health/?pretty | grep "status"|awk -F '[ "]+' '{print $4}'|grep -c 'green'

    重启zabbix-agent使配置生效
    service zabbix-agent restart

    在zabbix-server端测试
    zabbix_get -s ip -p 10050 -k es_status

    2.在zabbix的web页面添加对应的监控:

    添加监控项item

    Confuguration --> Hosts --> 找到对应的主机,点开 Items --> Create item



    创建触发器:
    Name
    es_status_check
    es_cluster_status is not green

    3.针对es集群中的每个节点做进程监控,如果进程挂了自动重启

    配置监控进程item


    配置触发器


    配置action,看参考 

    zabbix系列(九)zabbix3.0实现自动触发zabbix-agent端shell脚本任务

    http://blog.csdn.net/reblue520/article/details/52315154


    触发脚本:

    /usr/local/zabbix-agent/scripts/start_es.sh
    
    #!/bin/bash
    # if elasticsearch exists kill it
    source /etc/profile
    count_es=`ps -ef|grep elasticsearch|grep -v grep|wc -l`
    if [ $count_es -gt 1 ];then
        ps -ef|grep elasticsearch|grep -v grep|/bin/kill `awk '{print $2}'`
    fi
    # start it
    
    su yunva -c "cd /data/elasticsearch-5.0.1/bin && /bin/bash elasticsearch &"



    执行:
    sudo /bin/bash /usr/local/zabbix-agent/scripts/start_es.sh
    报错:
    which: no java in (/sbin:/bin:/usr/sbin:/usr/bin)
    Could not find any executable java binary. Please install java in your PATH or set JAVA_HOME

    解决办法:
    在脚本中添加
    source /etc/profile

    以root用户运行elasticsearch

    报错:
    can not run elasticsearch as root

    网上的方法,针对elasticsearch5.1不起作用
    解决方法1:
    在执行elasticSearch时加上参数-Des.insecure.allow.root=true,完整命令如下

    ./elasticsearch -Des.insecure.allow.root=true  
    解决办法2:
    用vi打开elasicsearch执行文件,在变量ES_JAVA_OPTS使用前添加以下命令

    ES_JAVA_OPTS="-Des.insecure.allow.root=true"  

    解决办法:
    su yunva -c "cd /data/elasticsearch-5.0.1/bin && /bin/bash elasticsearch &"

    自动拉起kibana服务的脚本:
    cat /usr/local/zabbix/scripts/restart_kibana.sh
    #!/bin/bash
    # if kibana exists kill it

    count_kibana=`ps -ef|grep kibana|grep -v grep|wc -l`
    if [ $count_kibana -eq 1 ];then
        ps -ef|grep kibana|grep -v grep|/bin/kill `awk '{print $2}'`
    fi
    # start it

    # 小小总结一下

    1.修改zabbix_agentd.conf文件打开远程shell命令
    # egrep -v '^#|^$' /usr/local/zabbix_agents_3.2.0/scripts/conf/zabbix_agentd.conf 
    
    EnableRemoteCommands=1
    UnsafeUserParameters=1
    
    # 打开zabbix的命令
    
    echo "Defaults:zabbix !requiretty" >> /etc/sudoers
    zabbix ALL=(ALL) NOPASSWD: ALL >> /etc/sudoers
    
    
    2.监控命令
    
    UserParameter=es_status,curl -sXGET http://172.16.0.230:9200/_cluster/health/?pretty | grep "status"|awk -F '[ "]+' '{print $4}'|grep -c 'green'
    UserParameter=es_debug,sudo /bin/find /opt/elasticsearch-5.6.15 -name hs_err_pid*.log -o -name java_pid*.hprof|wc -l
    # 默认的端口监控可能出问题,替换为如下命令 UserParameter
    =net.tcp.listen.grep[*],grep -q $(printf '%04X.00000000:0000.0A' $1) /proc/net/tcp ;if [ $? -eq 0 ];then echo 1;else grep -q "$(printf '%04X 0000000000000000000000 0000000000:0000 0A' $1)" /proc/net/tcp6;if [ $? -eq 0 ];then echo 1;else echo 0;fi;fi 3.当出现java内存溢出时触发重启elasticsearch的脚本 # vim /usr/local/zabbix_agents_3.2.0/scripts/start_es.sh #!/bin/bash # if elasticsearch exists kill it source /etc/profile # 删除java报错产生的文件 /usr/bin/sudo /bin/find /opt/elasticsearch-5.6.15 -name hs_err_pid*.log -o -name java_pid*.hprof | xargs rm -f # kill并重新启动elasticsearch count_es=`ps -ef|grep elasticsearch|grep -v grep|wc -l` if [ $count_es -ge 1 ];then ps -ef|grep elasticsearch|grep -v grep|/bin/kill -9 `awk '{print $2}'` fi # start it su elasticsearch -c "cd /opt/elasticsearch-5.6.15/bin && /bin/bash elasticsearch -d" 4.监控模板 <?xml version="1.0" encoding="UTF-8"?> <zabbix_export> <version>3.2</version> <date>2019-08-09T06:14:18Z</date> <groups> <group> <name>Templates</name> </group> </groups> <templates> <template> <template>es_cluster_monitor</template> <name>es_cluster_monitor</name> <description/> <groups> <group> <name>Templates</name> </group> </groups> <applications/> <items> <item> <name>es_debug</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>es_debug</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_status</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>es_status</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_9200</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>net.tcp.listen.grep[9200]</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_9300</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>net.tcp.listen.grep[9300]</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_process</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>proc.num[,,all,elasticsearch]</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> </items> <discovery_rules/> <httptests/> <macros/> <templates/> <screens/> </template> </templates> <triggers> <trigger> <expression>{cms_uts_es:es_status.last(0)}&lt;&gt;1 and {cms_uts_es:es_status.last(1)}&lt;&gt;1 and {cms_uts_es:es_status.last(2)}&lt;&gt;1</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es cluster is not green</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:proc.num[,,all,elasticsearch].last(0)}&lt;1 and {cms_uts_es:proc.num[,,all,elasticsearch].last(1)}&lt;1 and {cms_uts_es:proc.num[,,all,elasticsearch].last(2)}&lt;1</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es process was down</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:net.tcp.listen.grep[9200].max(#2)}=0</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es_9200 port down</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:net.tcp.listen.grep[9300].max(#2)}=0</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es_9300 port down</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:es_debug.last()}&lt;&gt;0</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es_debug error</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> </triggers> </zabbix_export>
  • 相关阅读:
    转:npm安装教程
    转:数据库收缩
    转:日志插件 log4net 的使用
    转:更改SQLServer实例默认字符集
    转:IIS 应用程序池 内存 自动回收
    IDisposable
    Sql Server 判断字符串是否可以转数字
    常用算法之快速排序
    Java调用JavaScript
    使用python生成iOS各规格icon
  • 原文地址:https://www.cnblogs.com/reblue520/p/6284394.html
Copyright © 2011-2022 走看看