zoukankan      html  css  js  c++  java
  • Elasticsearch学习记录(入门篇)

    Elasticsearch学习记录(入门篇)

    1、 Elasticsearch的请求与结果

    请求结构

    curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
    
    • VERB HTTP方法:GET, POST, PUT, HEAD, DELETE
    • PROTOCOL http或者https协议(只有在Elasticsearch前面有https代理的时候可用)
    • HOST Elasticsearch集群中的任何一个节点的主机名,如果是在本地的节点,那么就叫localhost
    • PORT Elasticsearch HTTP服务所在的端口,默认为9200
    • PATH API路径(例如_count将返回集群中文档的数量),PATH可以包含多个组件,例如_cluster/stats或者_nodes/stats/jvm
    • QUERY_STRING 一些可选的查询请求参数,例如?pretty参数将使请求返回更加美观易读的JSON数据
      BODY 一个JSON格式的请求主体(如果请求需要的话)

    PUT创建(索引创建)

    $ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d ' 
    

    {

    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
    

    }

    {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "3",
    "_version" : 1,
    "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
    },
    "created" : true
    }

    ##GET请求(搜索)
    ###检索文档
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'

    {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "1",
    "_version" : 1,
    "found" : true,
    "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
    }
    }

    ###简单搜索
    使用`megacorp`索引和`employee`类型,但是我们在结尾使用关键字\_search来取代原来的文档ID。响应内容的hits数组中包含了我们所有的三个文档。默认情况下搜索会返回前10个结果。
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'

    {
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "2",
    "_score" : 1.0,
    "_source" : {
    "first_name" : "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums",
    "interests" : [ "music" ]
    }
    }, {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
    }
    }, {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "3",
    "_score" : 1.0,
    "_source" : {
    "first_name" : "Douglas",
    "last_name" : "Fir",
    "age" : 35,
    "about" : "I like to build cabinets",
    "interests" : [ "forestry" ]
    }
    } ]
    }
    }

    接下来,让我们搜索姓氏中包含“Smith”的员工。我们将在命令行中使用轻量级的搜索方法。这种方法常被称作查询字符串(query string)搜索,因为我们像传递URL参数一样去传递查询语句:
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'

    
    

    {
    "took" : 4,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 2,
    "max_score" : 0.30685282,
    "hits" : [ {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "2",
    "_score" : 0.30685282,
    "_source" : {
    "first_name" : "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums",
    "interests" : [ "music" ]
    }
    }, {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "1",
    "_score" : 0.30685282,
    "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
    }
    } ]
    }
    }

    ###使用DSL语句查询
    查询字符串搜索便于通过命令行完成特定(ad hoc)的搜索,但是它也有局限性(参阅简单搜索章节)。Elasticsearch提供丰富且灵活的查询语言叫做DSL查询(Query DSL),它允许你构建更加复杂、强大的查询。
    
    DSL(Domain Specific Language特定领域语言)以JSON请求体的形式出现。我们可以这样表示之前关于“Smith”的查询:
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "query" : {
    "match" : {
    "last_name" : "Smith"
    }
    }
    }
    '

    ###更复杂的搜索
    我们让搜索稍微再变的复杂一些。我们依旧想要找到姓氏为“Smith”的员工,但是我们只想得到年龄大于30岁的员工。我们的语句将添加过滤器(filter),它使得我们高效率的执行一个结构化搜索:
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "query" : {
    "filtered" : {
    "filter" : {
    "range" : {
    "age" : { "gt" : 30 } --<1>
    }
    },
    "query" : {
    "match" : {
    "last_name" : "smith" --<2>
    }
    }
    }
    }
    }
    '

    
    * <1> 这部分查询属于区间过滤器(range filter),它用于查找所有年龄大于30岁的数据——gt为"greater than"的缩写。
    * <2> 这部分查询与之前的match语句(query)一致。
    
    

    {
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "2",
    "_score" : 0.30685282,
    "_source" : {
    "first_name" : "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums",
    "interests" : [ "music" ]
    }
    } ]
    }
    }

    ###全文搜索
    到目前为止搜索都很简单:搜索特定的名字,通过年龄筛选。让我们尝试一种更高级的搜索,全文搜索——一种传统数据库很难实现的功能。
    
    我们将会搜索所有喜欢“rock climbing”的员工:
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "query" : {
    "match" : {
    "about" : "rock climbing"
    }
    }
    }
    '

    你可以看到我们使用了之前的`match`查询,从`about`字段中搜索"**rock climbing**",我们得到了两个匹配文档:
    
    

    {
    "took" : 3,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 2,
    "max_score" : 0.16273327,
    "hits" : [ {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "1",
    "_score" : 0.16273327,<1>
    "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
    }
    }, {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "2",
    "_score" : 0.016878016,<2>
    "_source" : {
    "first_name" : "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums",
    "interests" : [ "music" ]
    }
    } ]
    }
    }

    
    * <1><2> 结果相关性评分。
    
    默认情况下,Elasticsearch根据结果相关性评分来对结果集进行排序,所谓的「结果相关性评分」就是文档与查询条件的匹配程度。很显然,排名第一的`John Smith`的`about`字段明确的写到“**rock climbing**”
    
    但是为什么`Jane Smith`也会出现在结果里呢?原因是“**rock**”在她的abuot字段中被提及了。因为只有“**rock**”被提及而“**climbing**”没有,所以她的`_score`要低于John。
    
    ###短语搜索
    目前我们可以在字段中搜索单独的一个词,这挺好的,但是有时候你想要确切的匹配若干个单词或者短语(phrases)。例如我们想要查询同时包含"rock"和"climbing"(并且是相邻的)的员工记录。
    
    要做到这个,我们只要将`match`查询变更为`match_phrase`查询即可:
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "query" : {
    "match_phrase" : {
    "about" : "rock climbing"
    }
    }
    }
    '

    {
    "took" : 16,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 1,
    "max_score" : 0.23013961,
    "hits" : [ {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "1",
    "_score" : 0.23013961,
    "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
    }
    } ]
    }
    }

    ###高亮我们的搜索
    很多应用喜欢从每个搜索结果中**高亮(highlight)**匹配到的关键字,这样用户可以知道为什么这些文档和查询相匹配。在Elasticsearch中高亮片段是非常容易的。
    
    让我们在之前的语句上增加`highlight`参数:
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "query" : {
    "match_phrase" : {
    "about" : "rock climbing"
    }
    },
    "highlight": {
    "fields" : {
    "about" : {}
    }
    }
    }
    '

    当我们运行这个语句时,会命中与之前相同的结果,但是在返回结果中会有一个新的部分叫做`highlight`,这里包含了来自`about`字段中的文本,并且用<em></em>来标识匹配到的单词。
    
    

    {
    "took" : 33,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 1,
    "max_score" : 0.23013961,
    "hits" : [ {
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "1",
    "_score" : 0.23013961,
    "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
    },
    "highlight" : {
    "about" : [ "I love to go rock climbing" ]
    }
    } ]
    }
    }

    ##聚合
    ###分析
    最后,我们还有一个需求需要完成:允许管理者在职员目录中进行一些分析。 Elasticsearch有一个功能叫做**聚合(aggregations)**,它允许你在数据上生成复杂的分析统计。它很像SQL中的`GROUP BY`但是功能更强大。
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "aggs": {
    "all_interests": {
    "terms": { "field": "interests" }
    }
    }
    }
    '

    查询结果:
    
    

    {...
    "aggregations" : {
    "all_interests" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [ {
    "key" : "music",
    "doc_count" : 2
    }, {
    "key" : "forestry",
    "doc_count" : 1
    }, {
    "key" : "sports",
    "doc_count" : 1
    } ]
    }
    }
    }

    这些数据并没有被预先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。
    
    如果我们想知道所有姓"Smith"的人最大的共同点(兴趣爱好),我们只需要增加合适的语句既可:
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "query": {
    "match": {
    "last_name": "smith"
    }
    },
    "aggs": {
    "all_interests": {
    "terms": {
    "field": "interests"
    }
    }
    }
    }
    '

    all_interests聚合已经变成只包含和查询语句相匹配的文档了:
    
    

    ...
    "all_interests": {
    "buckets": [
    {
    "key": "music",
    "doc_count": 2
    },
    {
    "key": "sports",
    "doc_count": 1
    }
    ]
    }

    
    聚合也允许分级汇总。例如,让我们统计每种兴趣下职员的平均年龄:
    
    

    $ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
    {
    "aggs" : {
    "all_interests" : {
    "terms" : { "field" : "interests" },
    "aggs" : {
    "avg_age" : {
    "avg" : { "field" : "age" }
    }
    }
    }
    }
    }
    '

    
    虽然这次返回的聚合结果有些复杂,但仍然很容易理解:
    
    

    ...
    "all_interests": {
    "buckets": [
    {
    "key": "music",
    "doc_count": 2,
    "avg_age": {
    "value": 28.5
    }
    },
    {
    "key": "forestry",
    "doc_count": 1,
    "avg_age": {
    "value": 35
    }
    },
    {
    "key": "sports",
    "doc_count": 1,
    "avg_age": {
    "value": 25
    }
    }
    ]
    }

    该聚合结果比之前的聚合结果要更加丰富。我们依然得到了兴趣以及数量(指具有该兴趣的员工人数)的列表,但是现在每个兴趣额外拥有`avg_age`字段来显示具有该兴趣员工的平均年龄。
  • 相关阅读:
    递归函数及Java范例
    笔记本的硬盘坏了
    “References to generic type List should be parameterized”
    配置管理软件(configuration management software)介绍
    WinCE文件目录定制及内存调整
    使用Silverlight for Embedded开发绚丽的界面(3)
    wince国际化语言支持
    Eclipse IDE for Java EE Developers 与Eclipse Classic 区别
    WinCE Heartbeat Message的实现
    使用Silverlight for Embedded开发绚丽的界面(2)
  • 原文地址:https://www.cnblogs.com/mr-cc/p/5762261.html
Copyright © 2011-2022 走看看