1.Logstash为日志收集必须掌握知识点
2.Logstash架构介绍
Logstash的基础架构类型pipeline流水线,如下图所示:
●Input,数据采集(常用插件: stdin、 file、 kafka、 beat. http、 )
●Filter :数据解析/转换(r常用插件: grok、date、 geoip、 mutate、 useragent )
●Output :数据输出(常用插件: Elasticsearch、 )
2.Logstash Input插件
input插件用于指定输入源,一个pipeline 可以有多个input插件,我们主要围绕下面几个input进行介绍:
●stdin
●file
●beat
●kafka
- 实战1 :从标准输入读取数据,从标准输出中输出内容:
#安装--依赖Java环境
[root@logstash-node1 ~]# yum install java -y
[root@logstash-node1 ~]# rpm -ivh logstash-7.4.0.rpm
[root@logstash-node1 ~]# cd /etc/logstash/
[root@logstash-node1 logstash]# vim jvm.options
# Xmx represents the maximum size of total heap space
-Xms512m #调整内存大小,实际生产环境大于一半,
-Xmx512m
#环境测试
[root@logstash-node1 logstash]# cd /etc/logstash/conf.d/
[root@logstash-node1 conf.d]# vim input_file_output_console.conf
[root@logstash-node1 conf.d]# cat input_file_output_console.conf
input {
file {
path => "/var/log/oldxu.log"
type => syslog
exclude => "*.gz" #不想监听的文件规则,基于glob匹配语法
start_position => "beginning" #第一次丛头开始读取文件 beginning or end
stat_interval => "3" #定时检查文件是否更新,默认1s
}
}
output {
stdout {
codec => rubydebug
}
}
[root@logstash-node1 conf.d]# /usr/share/logstash/bin/logstash -f input_file_output_console.conf
[root@logstash-node1 conf.d]# vim input_stdin_output_console.conf
[root@logstash-node1 conf.d]# cat input_stdin_output_console.conf
input {
stdin {
type => stdin
tags => "tags_stdin"
}
}
output {
stdout {
codec => "rubydebug"
}
}
[root@logstash-node1 ~]# echo "qwwe" >/var/log/oldxu.log
[root@logstash-node1 conf.d]# /usr/share/logstash/bin/logstash -f input_file_output_console.conf
......
{
"message" => "qwwe",
"path" => "/var/log/oldxu.log",
"@timestamp" => 2020-01-15T01:37:08.418Z,
"host" => "logstash-node1",
"type" => "syslog",
"@version" => "1"
}
3.Logstash Filter插件
数据从源传输到存储的过程中, Logstash 的filter过滤器能够解析各个事件,识别已命名的字段结构,
并将它们转换成通用格式,以便更轻松、更快速地分析和实现商业价值。
●利用Grok从非结构化数据中派生出结构
●利用geoip从IP地址分析出地理坐标
●利用useragent丛请求中分析操作系统、设备类型
3.1 Grok插件
1.grok是如何出现?
#我们希望将如下非结构化的数据解析成json结构化数据格式
120.27.74.166 - - [ 30/Dec/2019:11:59:18 +0800]"GET / HTTP/1.1"
302 1 54
"Mozi11a/5.0 (Macintosh; Intel Mac OS X 10 14 1) Chrome/79.0.3945.88 Safari/537.36"
#需要使用非常复杂的正则表达式
[([^]]+)]s[(W+)]s([^:]+:sw+sw+s[^:]+:S+s[^:]+:
S+sS+). *[([^]]+)]s[(w+)]s([^:]+:sw+sw+s[^:]+:
S+s[^:]+: S+sS+). *[([^]]+)]s[(w+)]s([^:]+:sW+
sw+s[^:]+:S+s[^:]+: S+sS+).*
2.grok如何解决该问题呢? grok其实是带有名字的正则表达式集台。grok 内置J很多pattern可以直接使用。
grok语法生成器: http://grokdebug.herokuapp.com/
#grok语法生成器grokdebug.herokuapp.com
%{IPORHOST:clientip} %{NGUSER:ident} %{NGUSER:auth} [%{HTTPDATE:timestamp}]
"%{WORD:verb} %{URIPATHPARAM: request} HTTP/%{NUMBER: httpversion}" %{NUMBER:response}
(?:%{NUMBER:bytes}I-) (?:"(?:%{URI:referrer}|-)" |%{QS:referrer})
%{QS:agent} %{QS:xforwardedfor} %{IPORHOST:host} %{BASE10NUM:request_duration}
3.grok语法示意图
http://grokdebug.herokuapp.com/
4.grok示例、使用grok pattern将Nginx日志格式化为json格式
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
3.3 Date插件
date插件:将日期字符串解析为日志类型。然后替换@timestamp字段或指定的其他字段
●match 类型为数组,用于指定日期匹配的格式,可以以此指定多种日期格式
●target类型为字符串,用于指定赋值的字段名,默认是@timestamp
●timezone 类型为字符串,用于指定时区域
1.date示例,将nginx请求中的timestamp日志进行解析
#创建插件input_http_output_console.conf
[root@logstash-node1 conf.d]# vim input_http_output_console.conf
input {
http {
port => 7474
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip {
source => "clientip"
}
#30/Dec/2019:11:59:18 +0800
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
timezone => "Asia/Shanghai"
}
useragent {
source => "agent"
target => "agent"
}
}
output {
stdout {
codec => rubydebug
}
}
[root@logstash-node1 conf.d]# /usr/share/logstash/bin/logstash -f input_http_output_console.conf -r
电脑下载软件-->此处为Windows:
https://insomnia.rest/download/#windows
①不同系统选择不同版本安装并运行软件
②依次执行即可,在③处输入http://10.0.0.151:7474
③将下列代码插入下图④所示位置,然后点击send提示ok表示成功,登录服务器查看。
#试验数据:
120.27.74.166 - - [30/Dec/2018:11:59:18 +0800] "GET / HTTP/1.1" 302 154 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) Chrome/79.0.3945.88 Safari/537.36"
66.249.73.135 - - [20/May/2015:21:05:11 +0000] "GET /blog/tags/xsendevent HTTP/1.1" 200 10049 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
- 打开服务器窗口展示如下所示表示成功
3.4 mutate插件
mutate_主要是对字段进行、类型转换、删除、替换、更新等操作_
●remove_ field 删除字段
●split字符串切割
●add_ field添加字段
●convert 类型转换
●gsub字符串替换
●rename字段重命名
mutate插件是logstash另一个非常重要的插件,它提供了丰富的基础类型数据处理能力,包括重命名、删除、替换、修改日志事件中的字段。我们这里举几个常用的mutate插件:字段类型转换功能covert、正则表达式替换字段功能gsub、分隔符分隔字符串为数值功能split、重命名字段功能rename、删除字段功能remove_field。
1.mutate删除无用字段比如: headers、message、 agent
filter{
grok {
match => {
"message" => "%{IP:ip}"
}
remove_field => ["message"]
}
geoip {
source => "ip"
}
}
2.分隔符分隔字符串为数组---->字符分割
split可以通过指定的分隔符分隔字段中的字符串为数组。
filter{
mutate {
split => { "message" => "|" }
}
}
3.添加字段add_field。
添加字段多用于split分隔中,主要是对split分隔后的字段中指定格式输出。
mutate {
add_field => {
"userID" => "%{[message][0]}"
remove_field => [ "message","headers","timestamp" ]
}
4.mutate中的convert类型转焕。支持转换 integer、float、string、和boolean
mutate {
add_field => {
"userID" => "%{[message][0]}"
"Action" => "%{[message][1]}"
"Date" => "%{[message][2]}"
}
remove_field => ["message","headers"]
convert => {
"userID" => "integer"
"Action" => "string"
"Date" => "string"
}
}
4.Logstash Output插件
负责将Logstash Event输出,常见的插件如下:
●stdout
●filehe
●elasticsearch
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["10.0.0.161:9200","10.0.0.162:9200","10.0.0.163:9200"]
index => "app-%{+YYYY.MM.dd}" #索引名称
template_overwrite => true
}
}
上述案例代码实现下效果展示
[root@logstash-node1 conf.d]# cat input_http_filter_grok_output_console.conf
input {
http {
port => 7474
}
}
filter {
# grok {
# match => { "message" => "%{COMBINEDAPACHELOG}" }
# }
#
# geoip {
# source => "clientip"
# }
#
#
# #30/Dec/2019:11:59:18 +0800
# date {
# match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
# target => "@timestamp"
# timezone => "Asia/Shanghai"
# }
#
# useragent {
# source => "agent"
# target => "agent"
# }
# mutate {
# remove_field => [ "message","headers","timestamp" ]
# }
mutate {
split => { "message" => "|" }
}
mutate {
add_field => {
"userID" => "%{[message][0]}"
"Action" => "%{[message][1]}"
"Date" => "%{[message][2]}"
}
remove_field => ["message","headers"]
convert => {
"userID" => "integer"
"Action" => "string"
"Date" => "string"
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["10.0.0.161:9200","10.0.0.162:9200","10.0.0.163:9200"]
index => "app-%{+YYYY.MM.dd}" #索引名称
template_overwrite => true
}
}
日志收集概述
●1.将Nginx普通日志转换为json
●2.将Nginx 日志的时间格式进行格式化输出
●3.将Nginx 日志的来源IP进行地域分析
●4.将Nginx 日志的user -agent字段进行分析
●5.将Nginx 日志的bytes修改为整数
●6.移除没有用的字段, message、 headers
#日志格式
66.249.73.135 - - [20/May/2015:21:05:11 +0000] "GET /blog/tags/xsendevent HTTP/1.1" 200 10049 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1.在grok上面生成message。
1.编写fiebeat
#依赖环境filebeat
[root@web01 ~]# cd /etc/filebeat/
[root@web01 filebeat]# /var/log/nginx/access.log
[root@web01 filebeat]# vim filebeat.yml
[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
tags: ["nginx-access"]
- type: log
enabled: true
path:
- /var/log/nginx/error.log
tags: ["nginx-error"]
output.logstash:
hosts: ["10.0.0.151:5044"]
#将日志写入/var/log/nginx/access.log
[root@web01 filebeat]# cat /var/log/nginx/access.log
66.249.73.135 - - [20/May/2015:21:05:11 +0000] "GET /blog/tags/xsendevent HTTP/1.1" 200 10049 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
[root@web01 filebeat]# systemctl restart filebeat
编写logstash文件
[root@logstash-node1 conf.d]# vim input_filebeat_output_es.conf
[root@logstash-node1 conf.d]# cat input_filebeat_output_es.conf
input {
beats {
port => 5044
}
}
filter {
if "nginx-access" in [tags][0] {
grok {
match => { "message" => "%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:useragent}" }
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
timezone => "Asia/Shanghai"
}
geoip {
source => "clientip"
}
useragent {
source => "useragent"
target => "useragent"
}
mutate {
rename => ["%{[host][name]}" , "hostname" ]
convert => [ "bytes", "integer" ]
remove_field => [ "message", "agent" , "input","ecs" ]
add_field => { "target_index" => "logstash-nginx-access-%{+YYYY.MM.dd}" }
}
} else if "nginx-error" in [tags][0] {
mutate {
add_field => { "target_index" => "logstash-nginx-error-%{+YYYY.MM.dd}" }
}
}
}
output {
elasticsearch {
hosts => ["10.0.0.161:9200","10.0.0.162:9200","10.0.0.163:9200"]
index => "%{[target_index]}"
}
}
[root@logstash-node1 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/input_filebeat_output_es.conf -r
#另开一个窗口查看端口
[root@logstash-node1 conf.d]# netstat -lntp
tcp6 0 0 :::5044 :::* LISTEN 10500/java
#创造点错误日志--->web01
[root@web01 filebeat]# curl 10.0.0.7/sdasfdsafadsfsdaf
进入浏览器查看并分析
1.MySQL慢日志收集介绍
1.什么是Mysql慢查询日志?
当SQL语句执行时间超过所设定的阈值时,便会记录到指定的日志文件中,所记录内容称之为慢查询日志。
2.为什么要收集Mysql慢查询日志?
数据库在运行期间,可能会存在SQL语句查询过慢,那我们如何快速定位、分析哪些SQL语旬需要优化处理,
又是哪些SQL语旬给业务系统造成影响呢?
当我们进行统-的收集分析, SQL语句执行的时间,对应语句的具体写法,一目了然.
3.如何收集Mysq|慢查询日志?
1.安装MySQL
2.开启MySQL慢查询日志记录
3.使用filebeat收集本地慢查询日志路径
环境:10.0.0.7 2G 1G
[root@web01 ~]# yum install mariadb mariadb-server -y
#重启mariadb
[root@db01 ~]# vim /etc/my.cnf
[mysqld]
...
slow_query_log=ON
slow_query_log_file=/var/log/mariadb/slow.log
long_query_time=3
...
[root@db01 ~]# systemctl restart mariadb
[root@web01 ~]# ls /var/log/mariadb
mariadb.log slow.log
[root@web01 ~]# mysql -uroot -poldxu.com
Welcome to the MariaDB monitor. Commands end with ; or g.
Your MariaDB connection id is 8
Server version: 5.5.64-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
#模拟慢日志
MariaDB [(none)]> select sleep(1) user,host from mysql.user;
+------+-----------+
| user | host |
+------+-----------+
| 0 | % |
| 0 | % |
| 0 | % |
| 0 | % |
| 0 | 127.0.0.1 |
| 0 | ::1 |
| 0 | localhost |
| 0 | localhost |
| 0 | localhost |
| 0 | web01 |
| 0 | web01 |
+------+-----------+
11 rows in set (11.48 sec)
Your MariaDB connection id is 8
MariaDB [(none)]> select sleep(1) user,host from mysql.user;
日志格式转换
#编写filebeat.yml文件
[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/mariadb/slow.log
exclude_lines: ['^# Time']
multiline.pattern: '^# User'
multiline.negate: true
multiline.match: after
multiline.max_lines: 10000
tags: ["mysql-slow"]
output.logstash:
hosts: ["10.0.0.151:5044"]
编写logstash文件
[root@logstash-node1 conf.d]# cat input_filebeat_mysql_output_es.conf
input {
beats {
port => 5044
}
}
filter {
mutate {
gsub => ["message","
"," "]
}
grok {
match => {
"message" => "(?m)^# User@Host: %{USER:User}[%{USER-2:User}] @ (?:(?<Clienthost>S*) )?[(?:%{IP:Client_IP})?] # Thread_id: %{NUMBER:Thread_id:integer}s+ Schema: (?:(?<DBname>S*) )s+QC_hit: (?:(?<QC_hit>S*) )# Query_time: %{NUMBER:Query_Time}s+ Lock_time: %{NUMBER:Lock_Time}s+ Rows_sent: %{NUMBER:Rows_Sent:integer}s+Rows_examined: %{NUMBER:Rows_Examined:integer} SET timestamp=%{NUMBER:timestamp}; s*(?<Query>(?<Action>w+)s+.*)"
}
}
date {
match => ["timestamp","UNIX", "YYYY-MM-dd HH:mm:ss"]
target => "@timestamp"
timezone => "Asia/Shanghai"
}
mutate {
remove_field => ["message","input","timestamp","agent","ecs","log"]
convert => ["Lock_Time","float"]
convert => ["Query_Time","float"]
add_field => { "target_index" => "logstash-mysql-slow-%{+YYYY.MM.dd}" }
}
}
output {
elasticsearch {
hosts => ["10.0.0.161:9200"]
index => "%{[target_index]}"
}
stdout {
codec => "rubydebug"
}
}
[root@logstash-node1 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/input_filebeat_mysql_output_es.conf -r
#进行日志刷新,启动filebeat
[root@web01 filebeat]# mysql -uroot -poldxu.com
Welcome to the MariaDB monitor. Commands end with ; or g.
Your MariaDB connection id is 17
Server version: 5.5.64-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
MariaDB [(none)]> select sleep(1) user,host from mysql.user;
+------+-----------+
| user | host |
+------+-----------+
| 0 | % |
| 0 | % |
| 0 | % |
| 0 | % |
| 0 | 127.0.0.1 |
| 0 | ::1 |
| 0 | localhost |
| 0 | localhost |
| 0 | localhost |
| 0 | web01 |
| 0 | web01 |
+------+-----------+
11 rows in set (11.01 sec)
MariaDB [(none)]> Bye
[root@web01 filebeat]# systemctl restart filebeat
服务器输出窗口如下图示
慢日志检测
创建索引,依次按步骤执行结果如下
logstash手机app日志
#上传app-dashboard-1.0-SNAPSHOT 到服务器web01模拟日志
[root@web01 log]# java -jar app-dashboard-1.0-SNAPSHOT.jar &>/var/log/app.log
[root@web01 ~]# tail -f /var/log/app.log
[INFO] 2020-01-15 22:21:03 [cn.oldxu.dashboard.Main] - DAU|2635|领取优惠券|2020-01-15 18:09:02
[INFO] 2020-01-15 22:21:08 [cn.oldxu.dashboard.Main] - DAU|3232|领取优惠券|2020-01-15 15:21:06
[INFO] 2020-01-15 22:21:11 [cn.oldxu.dashboard.Main] - DAU|8655|使用优惠券|2020-01-15 10:05:10
[INFO] 2020-01-15 22:21:15 [cn.oldxu.dashboard.Main] - DAU|498|评论商品|2020-01-15 18:15:04
[INFO] 2020-01-15 22:21:18 [cn.oldxu.dashboard.Main] - DAU|1603|加入购物车|2020-01-15 16:13:03
[INFO] 2020-01-15 22:21:18 [cn.oldxu.dashboard.Main] - DAU|7085|提交订单|2020-01-15 15:10:06
[INFO] 2020-01-15 22:21:21 [cn.oldxu.dashboard.Main] - DAU|5576|搜索|2020-01-15 09:06:06
[INFO] 2020-01-15 22:21:23 [cn.oldxu.dashboard.Main] - DAU|6309|搜索|2020-01-15 11:20:16
编写filebeat.yml配置文件
[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/app.log
hosts: ["10.0.0.151:5044"]
#思考:如果有两台机器都有日志,filebeat则再另外一台机器也要配置filebeat.yml
编写logstash文件
[root@logstash-node1 conf.d]# cat input_filebeat_app_output_es.conf
input {
beats {
port => 5044
}
}
filter {
mutate {
split => {"message" => "|"}
add_field => {
"UserID" => "%{[message][1]}"
"Action" => "%{[message][2]}"
"Date" => "%{[message][3]}"
}
convert => {
"UserID" => "integer"
"Action" => "string"
"Date" => "string"
}
}
#2020-01-15 17:04:15
date {
match => ["Date","yyyy-MM-dd HH:mm:ss"]
target => "@timestamp"
timezone => "Asia/Chongqing"
}
mutate {
#remove_field => ["message","Date"]
add_field => { "target_index" => "logstash-app-%{+YYYY.MM.dd}" }
}
}
output {
elasticsearch {
hosts => ["10.0.0.161:9200"]
index => "%{[target_index]}"
template_overwrite => true
}
stdout {
codec => "rubydebug"
}
}
[root@logstash-node1 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/input_filebeat_app_output_es.conf -r
[root@web01 filebeat]# systemctl restart filebeat