参考:https://www.bbsmax.com/A/gGdXbgXmJ4/
https://www.deathearth.com/333.html
https://www.cnblogs.com/amyzhu/p/10193557.html
ELK搭建好之后,如何利用收集到的数据进行告警呢,可以使用插件sentiel
一,安装环境
1,系统环境
2,软件版本选择
java 1.8.0_171 elasticsearch 6.2.4 kibana 6.2.4
二,安装
1,安装ELK
略
2,安装sentinl插件
根据ELK版本下载插件,本次下载版本为6.2.4
https://github.com/sirensolutions/sentinl/releases/
/usr/share/kibana/bin/kibana-plugin install file:///nas/nas/softs/elk/6.2.4/sentinl-v6.2.4-1.zip
安装后查看
设置邮件,修改kibana配置文件/etc/kibana/kibana.yml在尾部添加以下内容
sentinl: settings: email: active: true user: xxx@xxx.com #邮箱地址 password: xxxx #邮箱密码或者授权码 host: smtp.exmail.qq.com #发送邮件服务器 ssl: true #根据实际情况添加 改成false则port修改成25,如果是阿里云禁用25端口需要使用ssl port: 465 report: active: true
重启kibana
systemctl restart kibana
打开head可以查看到生成了一个名字为wacter_alarms的索引
打开kibana菜单可以看到sentina选项
新建一个watchers
修改完可以编辑或者测试
点击运行测试
查看告警信息
配置advanced文件设置查询告警条件,一个较为完整的配置文件如下
{ "actions": { "Email_alarm_773206d5-2977-465e-882d-762a7d69fe68": { "name": "Email alarm", "throttle_period": "15m", "email": { "priority": "low", "stateless": false, "body": "Find error log {{payload.hits.total}}", #发送邮件的内容,统计出现关键字错误的匹配次数 "to": "xxx@xxx.com", #邮件接收方自定义 "from": "xxx@xxx.com" #邮件发送方为kibana配置文件里面的邮箱 } } }, "input": { "search": { "request": { "index": [ "system-log-*" #索引名 ], "body": { "query": { "bool": { "must": [ { "range": { "@timestamp": { #匹配时间 "gte": "now-5m/m", #大于或等于从现在减5分钟 "lte": "now/m", #小于等于现在 "format": "epoch_millis" } } } ], "filter": [ { "multi_match": { "type": "best_fields", "query": "error", #匹配日志里面是否出现关键字error "lenient": true } } ] } }, "size": 0, "aggs": { "dateAgg": { "date_histogram": { "field": "@timestamp", "time_zone": "Asia/Shanghai", "interval": "1m", "min_doc_count": 1 } } } } } } }, "condition": { "script": { "script": "payload.hits.total>1" #匹配的次数大于1则触发告警动作 } }, "trigger": { "schedule": { "later": "every 5 minutes" #每五分钟执行一次 } }, "disable": false, "report": false, "title": "system-log错误日志监控告警", "wizard": {}, "save_payload": false, "spy": false, "impersonate": false }
PS:为方便理解加了注释,时间配置文件不可加注释
监控对应日志五分钟内是否出现关键字error如果出现并且大于1则触发邮件告警
往对应日志重定向几次error即可触发该告警
邮件内容如下
在写一个监控CPU使用率告警配置文件
{ "actions": { "HTML_email_alarm_5fbf1925-81fc-4d73-a37e-b6ac8b9bfc06": { "name": "HTML email alarm", "throttle_period": "1m", "email_html": { "html": "五分钟内cpu使用率超过10% 次数为{{ payload.hits.total }}", "priority": "low", "stateless": false, "to": "xxx@xxx.com", "from": "xxx@xxx.com" } } }, "input": { "search": { "request": { "index": [ "metricbeat-*" ], "body": { "query": { "bool": { "filter": [ { "range": { "system.cpu.total.pct": { "gt": 0.1 } } } ], "must": [ { "range": { "@timestamp": { "gte": "now-5m/m", "lte": "now/m", "format": "epoch_millis" } } } ] } }, "size": 0, "aggs": { "dateAgg": { "date_histogram": { "field": "@timestamp", "time_zone": "Europe/Amsterdam", "interval": "1m", "min_doc_count": 1 } } } } } } }, "condition": { "script": { "script": "payload.hits.total >=1" } }, "trigger": { "schedule": { "later": "every 5 minutes" } }, "disable": false, "report": false, "title": "metricber", "wizard": {}, "save_payload": true, "spy": false, "impersonate": false }
监控CPU使用率如果大于10%就告警,system.cpu.total.pct为浮点数,对比大于0.1就是大于10%