zoukankan      html  css  js  c++  java
  • 使用 Heka 导入自定义的nginx日志到Elasticsearch

    重置Heka执行进度

    heka的进度配置文件存在配置项 base_dir 设置的目录,只需要删除这个文件夹下面的内容,就可以完全重置heka的进度。

    base_dir 配置项默认是在下面目录: ‘/var/cache/hekad’  或‘c:varcachehekad’

    参考:http://hekad.readthedocs.org/en/latest/getting_started.html#global-configuration 

    删除Elasticsearch数据

    我们在调整导入策略后,数据需要重算,这时候就需要清除之前的数据,ES常用的几个插件就具有删除功能,用起来比较简单。

    如下面截图:

    image

    上图这个工具是下面这个:

    https://mobz.github.io/elasticsearch-head/    默认部署它的地址是: http://ip:9200/_plugin/head/

    另外还推荐这个: http://www.elastichq.org/     git地址在: https://github.com/royrusso/elasticsearch-HQ  默认它的部署地址是: http://ip:9200/_plugin/hq/

    解析并读取nginx日志

    由于我们nginx日志是自定义格式的,这时候我们就要用灵活度最高的 PayloadRegexDecoder 来定义正则表达式来提取数据。

    参考: http://hekad.readthedocs.org/en/latest/config/decoders/payload_regex.html

    由于Heka是go研发的, 它的正则表达式语法是 syntax 的语法, 简单地go正则表达式试用工具可以用 https://regoio.herokuapp.com/ 

    复杂的可以用 RegexBuddy(http://www.regexbuddy.com/download.html)。

    Timestamp

    默认Timestamp是当前时间,正则表达式中需要匹配出来的名字也是 Timestamp 才能被提取。

    另外,还有两个参数定时提取的规则。

    timestamp_layout

    定义提取时间的字符串表述,注意,这里是go的time格式定义。

    A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.

    一些静态的参数如下:

            ANSIC       = "Mon Jan _2 15:04:05 2006"
            UnixDate    = "Mon Jan _2 15:04:05 MST 2006"
            RubyDate    = "Mon Jan 02 15:04:05 -0700 2006"
            RFC822      = "02 Jan 06 15:04 MST"
            RFC822Z     = "02 Jan 06 15:04 -0700" // RFC822 with numeric zone
            RFC850      = "Monday, 02-Jan-06 15:04:05 MST"
            RFC1123     = "Mon, 02 Jan 2006 15:04:05 MST"
            RFC1123Z    = "Mon, 02 Jan 2006 15:04:05 -0700" // RFC1123 with numeric zone
            RFC3339     = "2006-01-02T15:04:05Z07:00"
            RFC3339Nano = "2006-01-02T15:04:05.999999999Z07:00"
            Kitchen     = "3:04PM"
            // Handy time stamps.
            Stamp      = "Jan _2 15:04:05"
            StampMilli = "Jan _2 15:04:05.000"
            StampMicro = "Jan _2 15:04:05.000000"
            StampNano  = "Jan _2 15:04:05.000000000"
    参考: https://golang.org/pkg/time/#pkg-constants

    timestamp_location

    时区定义,如果timestamp_layout中没有定义时区信息时,这个配置才起作用。

    Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’stime.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.

    一个配置的例子如下:

    [SphinxRequestDecoder]
    type = "PayloadRegexDecoder"
    match_regex = '.+ (?P<Hostname>S+) sphinx: (?P<Timestamp>.+) [(?P<Uuid>.+)] REQUEST: path=(?P<Path>S+) remoteaddr=(?P<Remoteaddr>S+) (?P<Headers>.+)'
    timestamp_layout = "2006/01/02 15:04:05"

    参考: https://github.com/mozilla-services/heka/wiki/How-to-convert-a-PayloadRegex-MultiDecoder-to-a-SandboxDecoder-using-an-LPeg-Grammar

     

    导入数据到 Elasticsearch

    导出数据到Elasticsearch,这时候我们就需要用 ElasticSearchOutput 了,这个output只是定义了 Elasticsearch 连接的一些属性,具体导出时的映射关系是下面三个 Encoder 定义的: ElasticSearch JSON Encoder, ElasticSearch Logstash V0 Encoder, or ElasticSearch Payload Encoder.

    这三个 Encoder的区别

    如下图:

    ElasticSearch JSON Encoder ElasticSearch Logstash V0 Encoder ElasticSearch Payload Encoder

    Plugin Name: ESJsonEncoder

    Plugin Name: ESLogstashV0Encoder

    Plugin Name: SandboxEncoder
    File Name: lua_encoders/es_payload.lua

    This encoder serializes a Heka message into a clean JSON format,
    preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing.

    This encoder serializes a Heka message into a JSON format,
    preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing.

    The message JSON structure uses the original (i.e. “v0”) schema popularized by Logstash.

    Using this schema can aid integration with existing Logstash deployments.

    This schema also plays nicely with the default Logstash dashboard provided by Kibana.

    Prepends ElasticSearch BulkAPI index JSON to a message payload.

    The JSON serialization is done by hand, without the use of Go’s stdlib JSON marshalling.

    This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

    The JSON serialization is done by hand, without using Go’s stdlib JSON marshalling.

    This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

     
      与 Logstash 的高度仿真 lua 插件

    ESJsonEncoder 为例,我们 timestamp 要用自己配置的时间,而不是消息产生的时间, 需要把它设置成 true。

    es_index_from_timestamp (bool):

    When generating the index name use the timestamp from the message instead of the current time. Defaults to false.

     

    注意这里 的 timestamp 设置目前我还没看到哪里在用,之前导入ES的数据时间以为是这里设置的,但是其实不是。

    ElasticSearchOutput 的一些设置

    ElasticSearchOutput 有两个下面参数,来确定按照什么频率给服务器发送请求。

    flush_interval (int):
    Interval at which accumulated messages should be bulk indexed into ElasticSearch, in milliseconds. Defaults to 1000 (i.e. one second).

    flush_count (int):
    Number of messages that, if processed, will trigger them to be bulk indexed into ElasticSearch. Defaults to 10.

    上面2个参数会同时生效,当队列中积攒了 flush_count 个消息或者定时延迟超过了 flush_interval 毫秒时, 如果有新消息,则发送给 ElasticSearch 。

    发送的地址是 http://10.30.0.32:9200/_bulk  。 随机抽取的一段发送的json数据如下:

     

    POST http://10.30.0.32:9200/_bulk HTTP/1.1
    Host: 10.30.0.32:9200
    User-Agent: Go 1.1 package http
    Content-Length: 9374
    Accept: application/json
    Accept-Encoding: gzip

    {"index":{"_index":"nginx-2016.01.06","_type":"nginx"}}
    {"Uuid":"12b6e9b3-d593-4cf4-b473-761ae7e982b0","Timestamp":"2016-01-06T01:31:51","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.213 - - [06/Jan/2016:09:31:51 +0800] u0022POST /simcard/uploadSimcardStatus HTTP/1.0u0022 200 61 u0022-u0022 u0022Apache-HttpClient/4.5 (Java/1.7.0_67)u0022 122.97.213.5 0.166u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.166","http_user_agent":"Apache-HttpClient/4.5 (Java/1.7.0_67)","upstream_response_time":"","remote_addr":"10.159.191.213","request":"POST /simcard/uploadSimcardStatus HTTP/1.0","hostname":"-","timestamp":"06/Jan/2016:09:31:51 +0800","http_x_forwarded_for":"122.97.213.5","remote_user":"-","body_bytes_sent":"61"}
    {"index":{"_index":"nginx-2016.01.05","_type":"nginx"}}
    {"Uuid":"6ff51dd8-ba9c-4440-b567-3de391cdac2b","Timestamp":"2016-01-05T07:36:45","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.90 - - [05/Jan/2016:15:36:45 +0800] u0022POST /soa/mfderchant/list HTTP/1.0u0022 200 926 u0022-u0022 u0022Java/1.7.0_71u0022 123.56.134.28 0.012u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","timestamp":"05/Jan/2016:15:36:45 +0800","remote_addr":"10.159.191.90","request":"POST /soa/merttchant/list HTTP/1.0","upstream_response_time":"","remote_user":"-","body_bytes_sent":"926","responseCode":"<responseCode>","http_referer":"-","http_x_forwarded_for":"123.56.134.28","hostname":"-","status":"200","request_time":"0.012"}
    {"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}
    {"Uuid":"58eb317c-2729-4037-a82e-d475e68324fd","Timestamp":"2015-12-17T14:03:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:22:03:26 +0800] u0022GET /creepers/creepers/pubddlic/images/cardCoupon/cardCoupon1.png HTTP/1.0u0022 404 296 u0022http://ewr.wangpos.com/creepersplatfofrm/index.xhtmlu0022 u0022Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36u0022 61.51.252.82 0.004u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /creepders/crefepers/public/images/cardCoupon/cardCoupon1.png HTTP/1.0","responseCode":"<responseCode>","http_referer":"http://rre.wangpos.com/creepersplatform/index.xhtml","upstream_response_time":"","http_x_forwarded_for":"61.51.252.82","timestamp":"17/Dec/2015:22:03:26 +0800","body_bytes_sent":"296","remote_user":"-","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","status":"404","request_time":"0.004","hostname":"-","remote_addr":"10.171.20.136"}
    {"index":{"_index":"nginx-2015.12.14","_type":"nginx"}}
    {"Uuid":"969f2737-0a21-4c27-908a-29a22f1a1475","Timestamp":"2015-12-14T10:01:02","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [14/Dec/2015:18:01:02 +0800] u0022POST /wxcaddrddeal/cashAccess/sendCard HTTP/1.0u0022 200 48 u0022-u0022 u0022Java/1.7.0_71u0022 123.56.134.28 0.016u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","hostname":"-","status":"200","body_bytes_sent":"48","http_x_forwarded_for":"123.56.134.28","upstream_response_time":"","request":"POST /wxcarddeal/cashAccess/sendCard HTTP/1.0","remote_addr":"10.171.20.136","remote_user":"-","http_referer":"-","responseCode":"<responseCode>","timestamp":"14/Dec/2015:18:01:02 +0800","request_time":"0.016"}
    {"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}
    {"Uuid":"80ff4701-85ad-4ecc-816c-833dbaded8df","Timestamp":"2016-01-08T07:27:11","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [08/Jan/2016:15:27:11 +0800] u0022GET /uploadify/jquery.uploadify-3.1.min.js HTTP/1.0u0022 304 0 u0022http://www.wadngpos.com/batchCheck2Code?posMerId=1823cf1eba79411a9d32a3cb8dd3b821u0022 u0022Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36u0022 61.51.252.82 0.004u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_x_forwarded_for":"61.51.252.82","remote_user":"-","upstream_response_time":"","timestamp":"08/Jan/2016:15:27:11 +0800","status":"304","hostname":"-","responseCode":"<responseCode>","http_referer":"http://65.wangpos.com/batchCheckCode?posMerId=1823cf1eba79411a9d32a3cb8dd3b821","request":"GET /uplfoadify/jquery.uploadify-3.1.min.js HTTP/1.0","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36","body_bytes_sent":"0","remote_addr":"10.171.20.136","request_time":"0.004"}
    {"index":{"_index":"nginx-2015.12.10","_type":"nginx"}}
    {"Uuid":"9c09fb0a-3fee-475c-bfad-04efd3a2f44e","Timestamp":"2015-12-10T11:32:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [10/Dec/2015:19:32:26 +0800] u0022POST /usfer/getSpuerUserByQulificationId HTTP/1.0u0022 200 182 u0022-u0022 u0022Java/1.7.0_71u0022 123.56.134.28 0.022u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","timestamp":"10/Dec/2015:19:32:26 +0800","responseCode":"<responseCode>","http_referer":"-","upstream_response_time":"","request":"POST /user/getSpuerUserByQulificationId HTTP/1.0","http_user_agent":"Java/1.7.0_71","body_bytes_sent":"182","status":"200","hostname":"-","http_x_forwarded_for":"123.56.134.28","request_time":"0.022","remote_user":"-"}
    {"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}
    {"Uuid":"d2c08886-cdd1-4dbb-b508-7bdec4d27460","Timestamp":"2015-12-17T07:20:29","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:15:20:29 +0800] u0022GET /weipossoa/ HTTP/1.0u0022 200 3460 u0022-u0022 u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2u0022 10.173.16.251 0.003u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","responseCode":"<responseCode>","status":"200","remote_user":"-","timestamp":"17/Dec/2015:15:20:29 +0800","http_referer":"-","request_time":"0.003","http_x_forwarded_for":"10.173.16.251","hostname":"-","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","upstream_response_time":"","request":"GET /weipossoa/ HTTP/1.0","body_bytes_sent":"3460"}
    {"index":{"_index":"nginx-2015.12.28","_type":"nginx"}}
    {"Uuid":"344bec04-268c-455d-94af-e44f72e50104","Timestamp":"2015-12-28T09:00:34","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.68 - - [28/Dec/2015:17:00:34 +0800] u0022GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0u0022 200 60 u0022-u0022 u0022Java/1.8.0_65u0022 61.51.252.82 0.003u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_user":"-","responseCode":"<responseCode>","hostname":"-","status":"200","http_referer":"-","timestamp":"28/Dec/2015:17:00:34 +0800","request_time":"0.003","http_user_agent":"Java/1.8.0_65","request":"GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0","upstream_response_time":"","remote_addr":"10.159.191.68","body_bytes_sent":"60","http_x_forwarded_for":"61.51.252.82"}
    {"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}
    {"Uuid":"0034bae1-6d16-486c-94fa-113d3cc15c42","Timestamp":"2016-01-08T22:20:25","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [09/Jan/2016:06:20:25 +0800] u0022GET /wxcard/jsp/common.jsp HTTP/1.0u0022 200 1407 u0022-u0022 u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2u0022 123.57.53.143 0.005u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /wxcard/jsp/common.jsp HTTP/1.0","upstream_response_time":"","timestamp":"09/Jan/2016:06:20:25 +0800","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.005","remote_addr":"10.171.20.136","http_x_forwarded_for":"123.57.53.143","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","hostname":"-","remote_user":"-","body_bytes_sent":"1407"}
    {"index":{"_index":"nginx-2016.01.02","_type":"nginx"}}
    {"Uuid":"7775c7fd-d7bb-4a80-89fa-03fda682ca62","Timestamp":"2016-01-02T09:19:55","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.97 - - [02/Jan/2016:17:19:55 +0800] u0022POST /PosBusiness/pos/biz/service HTTP/1.0u0022 200 117 u0022-u0022 u0022Apache-HttpClient/4.1.3 (java 1.5)u0022 10.173.53.128 0.017u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","body_bytes_sent":"117","status":"200","request":"POST /PosBusiness/pos/biz/service HTTP/1.0","timestamp":"02/Jan/2016:17:19:55 +0800","http_referer":"-","remote_user":"-","responseCode":"<responseCode>","upstream_response_time":"","http_x_forwarded_for":"10.173.53.128","hostname":"-","http_user_agent":"Apache-HttpClient/4.1.3 (java 1.5)","remote_addr":"10.159.191.97","request_time":"0.017"}

    这里是满足10条,所以就发送了一次。

  • 相关阅读:
    计算机方向的一些顶级会议和期刊—Top Conferences and Journals in Computer Science
    jvm dns缓存问题解决方式
    eclipse调试过程中插入代码执行
    Spring Atomikos分布式事务
    spring quartz 注解配置定时任务
    web系统性能分析JavaMelody
    收集到几种开源NLP工具
    记录些实用的linux指令串
    javamelody对Java Application进行监控
    解决ssh连接问题1
  • 原文地址:https://www.cnblogs.com/ghj1976/p/5145019.html
Copyright © 2011-2022 走看看