说明:
mariadb audit log是 mariadb 的审计日志
目的是把日志拆分成 tab 键分隔的字段
直接附上 fluentd 配置文件
<system> log_level error </system> <source> @type tail path /data/mysql_audit/* limit_recently_modified 86400
open_on_every_update true tag mysql_audit read_from_head true pos_file /tmp/fluentd.pos <parse> @type multiline format_firstline /^d{8}/ format1 /^(?<Date>d{8}) (?<Hour>d{2}):(?<Min>d{2}):(?<Sec>d{2}),(?<host>[^,]+),(?<user>[^,]+),(?<ip>[^,]+),(?<connid>[^,]+),(?<queryid>[^,]+),(?<action>[^,]+),(?<db>[^,]+),(?<message>.*),(?<retcode>d+)$/ </parse> </source> <filter mysql_audit> @type grep <regexp> key action pattern QUERY </regexp> <exclude> key user pattern lagou_status </exclude> <exclude> key db pattern information_schema </exclude> </filter> <filter mysql_audit> @type record_transformer enable_ruby <record> message ${record["message"].gsub(/s/, ' ')} message ${record["message"].gsub(/s+/, ' ')} </record> </filter> <match mysql_audit> #@type stdout @type webhdfs host oss-hadoop-namenode-bjc-002 path /mysql_audit/${Date}/${host}_${Hour} append true compress gzip <format> @type csv fields Date,Hour,Min,Sec,host,user,ip,action,db,message,retcode delimiter ' ' </format> <buffer host,Date,Hour> @type memory flush_interval 20s </buffer> </match>
fluentd 比 logstash 内存占用大大下降
分析同样的日志 logstash 占用700M, fluentd 占用35M
不过 cpu 占用相当,对于日志量大的机器 cpu 到100%
看来对日志做正则过滤很损耗 cpu
如果不加 open_on_every_update true 那么 td-agent 会一直保持打开过的文件描述符