zoukankan      html  css  js  c++  java
  • ELK入门以及常见指令

    ES的资源:

    https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
    https://www.elastic.co/webinars/getting-started-kibana?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
    https://www.elastic.co/webinars/getting-started-logstash?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
    es默认端口9200,可以看到es的基本信息
    http://localhost:9200/

    Elasticsearch: The Definitive Guide(第二个是master分支版本的权威指南)
    https://www.elastic.co/guide/en/elasticsearch/guide/index.html
    https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html

    shard代表一个索引(在主节点)存储到N个文件中,因为单个索引文件,太大了,查询将会有问题,所以分成多个文件来保存,其实有一种分割的味道,没有问题。
    replica代表副本,其实主要是用于高可用;避免单点故障。

    获取索引信息(_cat并不是cat猫,而是category)
    GET /_cat/indices?v
    创建一个索引
    PUT /customer?pretty
    GET /_cat/indices?v

    创建一个文档;PUT指定ID,POST则是不指定ID创建一个文档,ID为随机数;这里面有个pretty?这个pretty代表pretty-print,是指返回有好的JSON串;

    PUT /customer/_doc/1?pretty
        {
          "name": "John Doe"
        }
    GET /customer/_doc/1?pretty
    
    POST /customer/_doc?pretty
        {
          "name": "Jane Doe"
        }


    修改文档(本质是先删除后添加)

    POST /customer/_doc/1/_update?pretty
        {
          "doc": { "name": "Jane Doe" }
        }
    
    POST /customer/_doc/1/_update?pretty
        {
          "doc": { "name": "Jane Doe", "age": 20 }
        }
    
    POST /customer/_doc/1/_update?pretty
        {
          "script" : "ctx._source.age += 5"
        }


    删除文档
     DELETE /customer/_doc/2?pretty 

    批量处理(批量添加,以及批量修改)

     1 POST /customer/_doc/_bulk?pretty
     2     {"index":{"_id":"1"}}
     3     {"name": "John Doe" }
     4     {"index":{"_id":"2"}}
     5     {"name": "Jane Doe" }
     6 
     7 POST /customer/_doc/_bulk?pretty
     8     {"update":{"_id":"1"}}
     9     {"doc": { "name": "John Doe becomes Jane Doe" } }
    10     {"delete":{"_id":"2"}}


    批量导入数据

    curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"


    查询,注意这里用到了_search,还有在修改的时候,这个位是“_update"。q=*代表查询所有的文档,sort代表按照account_number做升序(asc)排列,pretty上面介绍了。返回结果中hits代表命中的documents,totals属性代表了返回条数;但是注意默认返回10条;可以由size属性来制定;
    GET /bank/_search?q=*&sort=account_number:asc&pretty
    等价查询

    1 GET /bank/_search
    2     {
    3       "query": { "match_all": {} },
    4       "sort": [
    5         { "account_number": "asc" }
    6       ]
    7     }


    如果想要从中间某段,通过指定from属性,代表从index=n开始;如果n=5.98,系统将会向下取整,取n=5;注意在此之前都是返回值max_score都是0,但是从这个查询开始因为引入了查询条件,max_score开始有值了。    

    1 GET /bank/_search
    2     {
    3       "query": { "match_all": {} },
    4       "from": 10, #代表从id=10开始
    5       "size": 10
    6     }


    返回指定列(Select col1,col2...)

    1 GET /bank/_search
    2     {
    3       "query": { "match_all": {} },
    4       "_source": ["account_number", "balance"]
    5     }


    指定检索列(Where)

    1 GET /bank/_search
    2     {
    3       "query": { "match": { "account_number": 20 } }
    4     }


    注意下面两组查询的差别,match和match phase之间的差别;前者是只要有任何一个匹配都是会作为检索结果的;并根据打分结果进行排序罗列;后者则要求短语全匹配,即位置之间关系必须严格按照mill在lane前一个位置;但是在操作中发现比如mill lane即使全匹配分值也不过是13.2,这个匹配是单词能够全部匹配,比如果198 Mill2 Lane,尽管只差一个Mill2,但是这样一来,分值是8.3,这个和其他数据,只匹配一个Lane的分值(Mill完全匹配不了)是一样的。

    1 GET /bank/_search
    2     {
    3       "query": { "match": { "address": "198 Mill Lane" } }
    4     }
    5 
    6 GET /bank/_search
    7     {
    8       "query": { "match_phrase": { "address": "198 Mill Lane" } }
    9     }


    bool查询,相当于where的“and”

     1 GET /bank/_search
     2     {
     3       "query": {
     4         "bool": {
     5           "must": [
     6             { "match": { "address": "mill" } },
     7             { "match": { "address": "lane" } }
     8           ]
     9         }
    10       }
    11     }


    bool+should相当于where条件的“or”

     1 GET /bank/_search
     2     {
     3       "query": {
     4         "bool": {
     5           "should": [
     6             { "match": { "address": "mill" } },
     7             { "match": { "address": "lane" } }
     8           ]
     9         }
    10       }
    11     }


    还有where条件取反,不包含呢

     1 GET /bank/_search
     2     {
     3       "query": {
     4         "bool": {
     5           "must_not": [
     6             { "match": { "address": "mill" } },
     7             { "match": { "address": "lane" } }
     8           ]
     9         }
    10       }
    11     }


    还可以组合查询

     1 GET /bank/_search
     2     {
     3       "query": {
     4         "bool": {
     5           "must": [
     6             { "match": { "age": "40" } }
     7           ],
     8           "must_not": [
     9             { "match": { "state": "ID" } }
    10           ]
    11         }
    12       }
    13     }


    过滤器
    这个过滤器是在bool查询器里面的;但是filter并不会触发文档计分;这个查询score显示为1是因为bool查询导致的文档评分;

     1 get /bank/_search
     2 {
     3   "query":{
     4     "bool":{
     5       "must":{"match_all":{}},
     6       "filter":{
     7         "range":{
     8           "balance":{
     9             "gte":2000,
    10             "lte":3000
    11           }
    12         }
    13       }
    14     }
    15   }
    16 }


    分组
    分组相当于groupby,下面的例子就是对于字段“state”值进行分组,去count值;group_by_state默认就是按照字段聚合计算count()值;
    这里size设置为0是因为只要聚集函数的结果,而不要查询结果;如果设置了size>0将会将检索结果显示在response中;

     1 GET /bank/_search
     2 {
     3   "size": 0,
     4   "aggs": {
     5     "group_by_state": {
     6       "terms": {
     7         "field": "state.keyword"
     8       }
     9     }
    10   }
    11 }


    再来一个复杂一些的,groupby做count合计之外,还做了balance字段取均值;注意均值是放在group_by_state里面的;同时在在groupby之后,按照均值进行排序。

     1 GET /bank/_search
     2 {
     3   "size": 0,
     4   "aggs": {
     5     "group_by_state": {
     6       "terms": {
     7         "field": "state.keyword",
     8         "order": {
     9           "average_balance": "desc"
    10         }
    11       },
    12       "aggs": {
    13         "average_balance": {
    14           "avg": {
    15             "field": "balance"
    16           }
    17         }
    18       }
    19     }
    20   }
    21 }


    再上一个更加复杂的,指定范围进行排序,同时指定了二级聚合字段(gender)

     1 GET /bank/_search
     2 {
     3   "size": 0,
     4   "aggs": {
     5     "group_by_age": {
     6       "range": {
     7         "field": "age",
     8         "ranges": [
     9           {
    10             "from": 20,
    11             "to": 30
    12           },
    13           {
    14             "from": 30,
    15             "to": 40
    16           },
    17           {
    18             "from": 40,
    19             "to": 50
    20           }
    21         ]
    22       },
    23       "aggs": {
    24         "group_by_gender": {
    25           "terms": {
    26             "field": "gender.keyword"
    27           },
    28           "aggs": {
    29             "average_balance": {
    30               "avg": {
    31                 "field": "balance"
    32               }
    33             }
    34           }
    35         }
    36       }
    37     }
    38   }
    39 }


    返回的片段

     1 "aggregations": {
     2     "group_by_age": {
     3       "buckets": [
     4         {
     5           "key": "20.0-30.0", #以及聚合字段
     6           "from": 20,
     7           "to": 30,
     8           "doc_count": 451,
     9           "group_by_gender": {
    10             "doc_count_error_upper_bound": 0,
    11             "sum_other_doc_count": 0,
    12             "buckets": [ #二级聚合字段
    13               {
    14                 "key": "M",
    15                 "doc_count": 232,
    16                 "average_balance": {
    17                   "value": 27374.05172413793
    18                 }
    19               },
    20               {
    21                 "key": "F",
    22                 "doc_count": 219,
    23                 "average_balance": {
    24                   "value": 25341.260273972603
    25                 }
    26               }
    27             ]
    28           }
    29         },
    30 ... ...
  • 相关阅读:
    Understanding CMS GC Logs--转载
    Understanding G1 GC Logs--转载
    gcview使用
    kafka源码分析之一server启动分析
    电商网站的初期技术选型--转
    An In-Depth Look at the HBase Architecture--转载
    Maven报错Missing artifact jdk.tools:jdk.tools:jar:1.7--转
    定时任务调度系统设计
    spark源码解析之scala基本语法
    Searching with regular sentences will only get you so far – if you need to find something a bit tricky turn to these advanced yet simple methods--转
  • 原文地址:https://www.cnblogs.com/xiashiwendao/p/9465279.html
Copyright © 2011-2022 走看看