zoukankan      html  css  js  c++  java
  • zabbix通过简单命令监控elasticsearch集群状态

    简单命令监控elasticsearch集群状态

    原理:
    使用curl命令模拟访问任意一个es节点可以反馈的集群状态,集群的状态需要为green
    curl -sXGET http://serverip:9200/_cluster/health/?pretty

    {
      "cluster_name" : "yunva-es",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 7,
      "number_of_data_nodes" : 6,
      "active_primary_shards" : 66,
      "active_shards" : 132,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }

    前端使用了nginx验证,需要模拟登陆
    curl模拟用户登录命令格式:
    curl -u username:password -sXGET http://serverip:9200/_cluster/health/?pretty | grep "status"|awk -F '[ "]+' '{print $4}'

    1.修改客户端zabbix配置:
    vim /etc/zabbix/zabbix_agentd.conf

    UserParameter=es_status,curl -u elkadmin:hSeC7ENeirAAPzv047m4 -sXGET http://serverip/_cluster/health/?pretty | grep "status"|awk -F '[ "]+' '{print $4}'|grep -c 'green'

    重启zabbix-agent使配置生效
    service zabbix-agent restart

    在zabbix-server端测试
    zabbix_get -s ip -p 10050 -k es_status

    2.在zabbix的web页面添加对应的监控:

    添加监控项item

    Confuguration --> Hosts --> 找到对应的主机,点开 Items --> Create item



    创建触发器:
    Name
    es_status_check
    es_cluster_status is not green

    3.针对es集群中的每个节点做进程监控,如果进程挂了自动重启

    配置监控进程item


    配置触发器


    配置action,看参考 

    zabbix系列(九)zabbix3.0实现自动触发zabbix-agent端shell脚本任务

    http://blog.csdn.net/reblue520/article/details/52315154


    触发脚本:

    /usr/local/zabbix-agent/scripts/start_es.sh
    
    #!/bin/bash
    # if elasticsearch exists kill it
    source /etc/profile
    count_es=`ps -ef|grep elasticsearch|grep -v grep|wc -l`
    if [ $count_es -gt 1 ];then
        ps -ef|grep elasticsearch|grep -v grep|/bin/kill `awk '{print $2}'`
    fi
    # start it
    
    su yunva -c "cd /data/elasticsearch-5.0.1/bin && /bin/bash elasticsearch &"



    执行:
    sudo /bin/bash /usr/local/zabbix-agent/scripts/start_es.sh
    报错:
    which: no java in (/sbin:/bin:/usr/sbin:/usr/bin)
    Could not find any executable java binary. Please install java in your PATH or set JAVA_HOME

    解决办法:
    在脚本中添加
    source /etc/profile

    以root用户运行elasticsearch

    报错:
    can not run elasticsearch as root

    网上的方法,针对elasticsearch5.1不起作用
    解决方法1:
    在执行elasticSearch时加上参数-Des.insecure.allow.root=true,完整命令如下

    ./elasticsearch -Des.insecure.allow.root=true  
    解决办法2:
    用vi打开elasicsearch执行文件,在变量ES_JAVA_OPTS使用前添加以下命令

    ES_JAVA_OPTS="-Des.insecure.allow.root=true"  

    解决办法:
    su yunva -c "cd /data/elasticsearch-5.0.1/bin && /bin/bash elasticsearch &"

    自动拉起kibana服务的脚本:
    cat /usr/local/zabbix/scripts/restart_kibana.sh
    #!/bin/bash
    # if kibana exists kill it

    count_kibana=`ps -ef|grep kibana|grep -v grep|wc -l`
    if [ $count_kibana -eq 1 ];then
        ps -ef|grep kibana|grep -v grep|/bin/kill `awk '{print $2}'`
    fi
    # start it

    # 小小总结一下

    1.修改zabbix_agentd.conf文件打开远程shell命令
    # egrep -v '^#|^$' /usr/local/zabbix_agents_3.2.0/scripts/conf/zabbix_agentd.conf 
    
    EnableRemoteCommands=1
    UnsafeUserParameters=1
    
    # 打开zabbix的命令
    
    echo "Defaults:zabbix !requiretty" >> /etc/sudoers
    zabbix ALL=(ALL) NOPASSWD: ALL >> /etc/sudoers
    
    
    2.监控命令
    
    UserParameter=es_status,curl -sXGET http://172.16.0.230:9200/_cluster/health/?pretty | grep "status"|awk -F '[ "]+' '{print $4}'|grep -c 'green'
    UserParameter=es_debug,sudo /bin/find /opt/elasticsearch-5.6.15 -name hs_err_pid*.log -o -name java_pid*.hprof|wc -l
    # 默认的端口监控可能出问题,替换为如下命令 UserParameter
    =net.tcp.listen.grep[*],grep -q $(printf '%04X.00000000:0000.0A' $1) /proc/net/tcp ;if [ $? -eq 0 ];then echo 1;else grep -q "$(printf '%04X 0000000000000000000000 0000000000:0000 0A' $1)" /proc/net/tcp6;if [ $? -eq 0 ];then echo 1;else echo 0;fi;fi 3.当出现java内存溢出时触发重启elasticsearch的脚本 # vim /usr/local/zabbix_agents_3.2.0/scripts/start_es.sh #!/bin/bash # if elasticsearch exists kill it source /etc/profile # 删除java报错产生的文件 /usr/bin/sudo /bin/find /opt/elasticsearch-5.6.15 -name hs_err_pid*.log -o -name java_pid*.hprof | xargs rm -f # kill并重新启动elasticsearch count_es=`ps -ef|grep elasticsearch|grep -v grep|wc -l` if [ $count_es -ge 1 ];then ps -ef|grep elasticsearch|grep -v grep|/bin/kill -9 `awk '{print $2}'` fi # start it su elasticsearch -c "cd /opt/elasticsearch-5.6.15/bin && /bin/bash elasticsearch -d" 4.监控模板 <?xml version="1.0" encoding="UTF-8"?> <zabbix_export> <version>3.2</version> <date>2019-08-09T06:14:18Z</date> <groups> <group> <name>Templates</name> </group> </groups> <templates> <template> <template>es_cluster_monitor</template> <name>es_cluster_monitor</name> <description/> <groups> <group> <name>Templates</name> </group> </groups> <applications/> <items> <item> <name>es_debug</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>es_debug</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_status</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>es_status</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_9200</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>net.tcp.listen.grep[9200]</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_9300</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>net.tcp.listen.grep[9300]</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> <item> <name>es_process</name> <type>0</type> <snmp_community/> <multiplier>0</multiplier> <snmp_oid/> <key>proc.num[,,all,elasticsearch]</key> <delay>30</delay> <history>90</history> <trends>365</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <delta>0</delta> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <formula>1</formula> <delay_flex/> <params/> <ipmi_sensor/> <data_type>0</data_type> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications/> <valuemap/> <logtimefmt/> </item> </items> <discovery_rules/> <httptests/> <macros/> <templates/> <screens/> </template> </templates> <triggers> <trigger> <expression>{cms_uts_es:es_status.last(0)}&lt;&gt;1 and {cms_uts_es:es_status.last(1)}&lt;&gt;1 and {cms_uts_es:es_status.last(2)}&lt;&gt;1</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es cluster is not green</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:proc.num[,,all,elasticsearch].last(0)}&lt;1 and {cms_uts_es:proc.num[,,all,elasticsearch].last(1)}&lt;1 and {cms_uts_es:proc.num[,,all,elasticsearch].last(2)}&lt;1</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es process was down</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:net.tcp.listen.grep[9200].max(#2)}=0</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es_9200 port down</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:net.tcp.listen.grep[9300].max(#2)}=0</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es_9300 port down</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> <trigger> <expression>{cms_uts_es:es_debug.last()}&lt;&gt;0</expression> <recovery_mode>0</recovery_mode> <recovery_expression/> <name>es_debug error</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>0</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> </triggers> </zabbix_export>
  • 相关阅读:
    SAP MM 采购附加费计入物料成本之二
    SAP MM 采购附加费计入物料成本?
    SAP MM 作为采购附加费的运费为啥没能在收货的时候计入物料成本?
    SAP MM 外部采购流程里的Advanced Return Management
    SAP MM 外部采购流程里的如同鸡肋一样的Advanced Returns Management功能
    SAP MM Why is the freight not included in the material cost at the time of GR?
    SAP MM: Change of material moving average price after goods receipt and invoice verification posting for PO
    SAP 创建启用了ARM功能的采购订单,报错 Shipping processing is not selected to supplier 100057 in purchase org. 0002
    GIT·代码仓库默认分支更改
    .Net/C#·运行报错缺少XXX文件,但双击无法跳转缺少位置
  • 原文地址:https://www.cnblogs.com/reblue520/p/6284394.html
Copyright © 2011-2022 走看看