zoukankan      html  css  js  c++  java
  • Elasticsearch 入门

    1. 术语

    在 ElasticSearch 中,存入一个文件的动作称为索引(indexing)。对比传统关系型数据库,ElasticSearch中的类比为:

    Relational DB -> Databases -> Tables -> Rows           -> Columns

    Elasticsearch  -> Indices       -> Types  -> Documents -> Fields

    也就是说,ElasticSearch 中包含多个索引(Indices)(数据库),每个索引可以包含多个类型(Types)(表),每个类型里包含多个文档(Documents)(行),每个文档有多个字段(Fields)(列)

    2. 写入与检索操作

    写入数据

    下面我们看一个例子:

    我们 put 一条数据到 ES:

    curl -XPOST https://es_endpoint/corporation/employee/1 -d '

    {

        "first_name" : "John",

        "last_name" :  "Smith",

        "age" :        25,

        "about" :      "I love to go rock climbing",

        "interests": [ "sports", "music" ]

    }' -H 'Content-Type: application/json'

    这里 es_endpoint 为 ElasticSearch 的终端节点,corporation 为索引(Index),employee为类型(Type),1 为 id。

    在放入数据到ES后,我们即可以使用 GET 方法获取数据,如:

    curl -XGET https://es_endpoint/corporation/employee/1

    {"_index":"corporation",

    "_type":"employee",

    "_id":"1",

    "_version":3,

    "_seq_no":2,

    "_primary_term":1,

    "found":true,

    "_source":

    {

        "first_name" :  "Douglas",

        "last_name" :   "Fir",

        "age" :         35,

        "about":        "I like to build cabinets",

        "interests":  [ "forestry" ]

    }}

    ElasticSearch 中使用的是 HTTP 方法进行操作,比如 GET 方法用于检索文档,POST 方法或 PUT 方法写入文档(或是更新文档)。DELETE 方法用于删除文档,HEAD 方法用于检查某文档是否存在。

    获取数据

     GET 方法可以通过 id 获取唯一文档,不过如果需求是搜索文档,则可以使用如下方式,将 id 换为_search:

    curl -XGET https://es_endpoint/corporation/employee/_search

    检索数据

    使用这个方式会将类型为employee中的所有文档均检索出来,若是需要进行条件检索,则可以用:

    curl -XGET https://es_endpoint/corporation/employee/_search?q=first_name:Jane

    查询结果为:

    {"took":5,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.2876821,"hits":[{"_index":"corporation","_type":"employee","_id":"2","_score":0.2876821,"_source":

    {

        "first_name" :  "Jane",

        "last_name" :   "Smith",

        "age" :         32,

        "about" :       "I like to collect rock albums",

        "interests":  [ "music" ]

    }}]}}

    DSL 检索

    以上查询仅用于一些简单查询场景,ElasticSearch 提供了更丰富且灵活的查询语言,DSL(Domain Specific Language)。此查询以 JSON 的方式进行请求,例如对于上一个简单查询,我们可以改写为:

    curl -XGET https://es_endpoint/corporation/employee/_search -d '

    {

        "query" : {

             "match" : {

                 "first_name" : "Jane"

             }

        }

    } ' -H 'Content-Type: application/json'

    查询结果为:

    {"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.2876821,"hits":[{"_index":"corporation","_type":"employee","_id":"2","_score":0.2876821,"_source":

    {

        "first_name" :  "Jane",

        "last_name" :   "Smith",

        "age" :         32,

        "about" :       "I like to collect rock albums",

        "interests":  [ "music" ]

    }}]}}

    更复杂的检索

    我们在查询语句中加入一个过滤器,过滤掉年纪大于 30 岁的员工:

    curl -XGET https://es_endpoint/corporation/employee/_search -d '

    {

        "query" : {

            "bool" : {

                "filter" : {

                    "range" : {

                        "age" : { "gt" : 30 }

                    }

                },

                "must" : {

                    "match" : {

                        "last_name" : "smith"

                    }

                }

            }

        }

    } ' -H 'Content-Type: application/json'

    这里我们用了一个过滤器(fliter),将年龄大于30岁的文档进行过滤,然后匹配last_name 为 smith 的温度。

    全文搜索

    在全文搜索中,我们可以指定文档中任意字段的数据,进行全文检索,例如:

    curl https://es_endpoint/corporation/employee/_search -d '

    {

        "query" : {

            "match" : {

                "about" : "rock climbing"

            }

        }

    } ' -H 'Content-Type: application/json'

    结果为:

    {"took":9,

    "timed_out":false,

    "_shards":{"total":5,"successful":5,"skipped":0,"failed":0},

    "hits":{"total":{"value":2,"relation":"eq"},

    "max_score":0.5753642,

    "hits":[

    {"_index":"corporation",

     "_type":"employee",

     "_id":"1",

     "_score":0.5753642,

     "_source":

    {

        "first_name" : "John",

        "last_name" :  "Smith",

        "age" :        25,

        "about" :      "I love to go rock climbing",

        "interests": [ "sports", "music" ]

    }},

    {"_index":"corporation",

     "_type":"employee",

     "_id":"2",

     "_score":0.2876821,

     "_source":

    {

        "first_name" :  "Jane",

        "last_name" :   "Smith",

        "age" :         32,

        "about" :       "I like to collect rock albums",

        "interests":  [ "music" ]

    }}]}}

    可以看到两个返回的文档中有_score 的字段,这个字段表示的是:与匹配条件的相关性。返回的文档按相关性降序排序。可以看到我们检索的条件有 rock climbing,但是仅包含 rock 的第二个文档也被检索出来,但是相关性低于第一个文档。

    短语检索

    上面的检索进行了 rock climbing 的模糊匹配,若是要进行此短语的精确匹配,则可以将match 改为 match_phrase,如:

    https://es_endpoint/corporation/employee/_search -d '

    {

        "query" : {

            "match_phrase" : {

                "about" : "rock climbing"

            }

        }

    } ' -H 'Content-Type: application/json'

    高亮搜索

    很多应用中,需要对搜索中匹配到的关键词进行高亮(highlight),这样可以直观地查看到查询的匹配。ElasticSearch 直接提供了高亮的功能,在语句上增加highlight 的参数即可,例如:

    curl -XGET https://search-tangaws-5grg7m53kinfqf2mip6oq6woqm.cn-north-1.es.amazonaws.com.cn/corporation/employee/_search -d '

    {

        "query" : {

            "match_phrase" : {

                "about" : "rock climbing"

            }

        },

        "highlight": {

            "fields" : {

                "about" : {}

            }

        }

    }' -H 'Content-Type: application/json'

    结果为:

    {"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.5753642,"hits":[{"_index":"corporation","_type":"employee","_id":"1","_score":0.5753642,"_source":

    {

        "first_name" : "John",

        "last_name" :  "Smith",

        "age" :        25,

        "about" :      "I love to go rock climbing",

        "interests": [ "sports", "music" ]

    },"highlight":{"about":["I love to go <em>rock</em> <em>climbing</em>"]}}]}}

    可以看到返回的结果中多了一个新的字段为“highlight”,此字段中包含了about 中匹配到的文本,并使用了<em></em>用于标识匹配到的单词。

    3. 聚合操作

    在数据分析的场景中,我们需要对文档进行一些统计分析。ElasticSearch 提供了一个功能叫聚合(aggregations),它可以让我们在数据上生成复杂的统计分析。此功能类似于 SQL 中的 group by,但是功能更强大。

    例如,我们需要找到所有employee中最多的兴趣爱好:

    curl -XGET https://search-tangaws-5grg7m53kinfqf2mip6oq6woqm.cn-north-1.es.amazonaws.com.cn/corporation/employee/_search -d '

    {

      "aggs": {

        "all_interests": {

          "terms": { "field": "interests.keyword" }

        }

      }

    }' -H 'Content-Type: application/json'

    返回的结果为:

    …前面的结果忽略,我们仅看统计信息:

    "aggregations": {

                "all_interests": {

                      "doc_count_error_upper_bound": 0,

                      "sum_other_doc_count": 0,

                      "buckets": [{

                            "key": "music",

                            "doc_count": 2

                      }, {

                            "key": "forestry",

                            "doc_count": 1

                      }, {

                            "key": "sports",

                            "doc_count": 1

                      }]

                }

          }

    可以看到有两个员工的兴趣爱好为 music,对forestry与sports 感兴趣的员工均只有一名。

    References:

    https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html

  • 相关阅读:
    【IDEA】项目最好强制 utf-8,换行符强制 Unix格式,制表符4个空格
    【Maven】有关 snapshots、releases 的说明
    【Maven】与私服有关的本地操作(上传、拉取jar包;版本发布)
    【Maven】nexus 安装(基于docker)
    【Maven】maven命令(编译、打包、安装、发布)区别
    【Linux、Centos7】添加中文拼音输入
    生成器、列表推导式、生成器表达式
    列表:python基础数据类型
    数据类型之间转化、字符串学习
    while 循环、格式化输出、运算符
  • 原文地址:https://www.cnblogs.com/zackstang/p/12021845.html
Copyright © 2011-2022 走看看