zoukankan      html  css  js  c++  java
  • Elasticsearch必知必会的干货知识一:ES索引文档的CRUD

    ​ 若在传统DBMS 关系型数据库中查询海量数据,特别是模糊查询,一般我们都是使用like %查询的值%,但这样会导致无法应用索引,从而形成全表扫描效率低下,即使是在有索引的字段精确值查找,面对海量数据,效率也是相对较低的,所以目前一般的互联网公司或大型公司,若要查询海量数据,最好的办法就是使用搜索引擎,目前比较主流的搜索引擎框架就是:Elasticsearch,故今天我这里总结了Elasticsearch必知必会的干货知识一:ES索引文档的CRUD,后面陆续还会有其它干货知识分享,敬请期待。

    1. ES索引文档的CRUD(6.X与7.X有区别,6.X中支持一个index创建多个type,而7.X中及以上只支持1个固定的type,即:_doc,API用法上也稍有不同):

      1. Create创建索引文档【POST index/type/id可选,如果index、type、id已存在则重建索引文档(先删除后创建索引文档,与Put index/type/id 原理相同),如果在指定id情况下需要限制自动更新,则可以使用:index/type/id?op_type=create 或 index/type/id/_create,指明操作类型为创建,这样当存在的记录的情况下会报错】

        POST demo_users/_doc 或 demo_users/_doc/2vJKsm8BriJODA6s9GbQ/_create

        Request Body:

        {
        "userId":1,
        "username":"张三",
        "role":"administrator",
        "enabled":true,
        "createdDate":"2020-01-01T12:00:00"
        }
        

        Response Body:

        {
        "_index": "demo_users",
        "_type": "_doc",
        "_id": "2vJKsm8BriJODA6s9GbQ",
        "_version": 1,
        "result": "created",
        "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1
        }
        
      2. Get获取索引文档【Get index/type/id】

        Get demo_users/_doc/123

        Response Body:

        {
        "_index": "demo_users",
        "_type": "_doc",
        "_id": "123",
        "_version": 1,
        "found": true,
        "_source": {
        "userId": 1,
        "username": "张三",
        "role": "administrator",
        "enabled": true,
        "createdDate": "2020-01-01T12:00:00"
        }
        }
        
      3. Index Put重建索引文档【PUT index/type/id 或 index/type/id?op_type=index,id必传,如果id不存在文档则创建文档,否则先删除原有id文档后再重新创建文档,version加1】

        Put/POST demo_users/_doc/123 或 demo_users/_doc/123?op_type=index

        Request Body:

        {
        "userId":1,
        "username":"张三",
        "role":"administrator",
        "enabled":true,
        "createdDate":"2020-01-01T12:00:00",
        "remark":"仅演示"
        }
        

        Response Body:

        {
        "_index": "demo_users",
        "_type": "_doc",
        "_id": "123",
        "_version": 4,
        "result": "updated",
        "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
        },
        "_seq_no": 10,
        "_primary_term": 1
        }
        
      4. Update更新索引文档【POST index/type/id/_update 请求体必需是{"doc":{具体的文档JSON}},如果指定的键字段已存在则更新,如果指定的键字段不存在则附加新的键值对,支持多层级嵌套,多次请求,如果有字段值有更新则version加1,否则提示更新0条 】

        POST demo_users/_doc/123/_update

        Request Body:

        {
          "doc": {
            "userId": 1,
            "username": "张三",
            "role": "administrator",
            "enabled": true,
            "createdDate": "2020-01-01T12:00:00",
            "remark": "仅演示POST更新5",
            "updatedDate": "2020-01-17T15:30:00"
          }
        }
        

        Response Body:

        {
        "_index": "demo_users",
        "_type": "_doc",
        "_id": "123",
        "_version": 26,
        "result": "updated",
        "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
        },
        "_seq_no": 35,
        "_primary_term": 1
        }
        
      5. Delete删除索引文档【DELETE index/type/id】

        DELETE demo_users/_doc/123

        Response Body:

        {
        "_index": "demo_users",
        "_type": "_doc",
        "_id": "123",
        "_version": 2,
        "result": "deleted",
        "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
        },
        "_seq_no": 39,
        "_primary_term": 1
        }
        
      6. Bulk批量操作文档【POST _bulk 或 index/_bulk 或 index/type/_bulk 一次请求支持进行多个索引、多个type的多种不同的CRUD操作,如果操作中有某个出现错误不会影响其它操作;】

        POST _bulk

        Request Body:(注意最后还得多一个换行,因为ES是根据换行符来识别多条命令的,如果缺少最后一条换行则会报错,注意请求体非标准的JSON,每行才是一个JSON,整体顶多可看成是 区分的JSON对象数组)

        { "index" : { "_index" : "demo_users_test", "_type" : "_doc", "_id" : "1" } }
        { "bulk_field1" : "测试创建index" }
        { "delete" : { "_index" : "demo_users", "_type" : "_doc", "_id" : "123" } }
        { "create" : { "_index" : "demo_users", "_type" : "_doc", "_id" : "2" } }
        { "bulk_field2" : "测试创建index2" }
        { "update" : { "_index" : "demo_users_test","_type" : "_doc","_id" : "1" } }
        { "doc": {"bulk_field1" : "测试创建index1","bulk_field2" : "测试创建index2"} }
        
        
        

        Response Body:

        {
            "took": 162,
            "errors": true,
            "items": [
                {
                    "index": {
                        "_index": "demo_users_test",
                        "_type": "_doc",
                        "_id": "1",
                        "_version": 8,
                        "result": "updated",
                        "_shards": {
                            "total": 2,
                            "successful": 2,
                            "failed": 0
                        },
                        "_seq_no": 7,
                        "_primary_term": 1,
                        "status": 200
                    }
                },
                {
                    "delete": {
                        "_index": "demo_users",
                        "_type": "_doc",
                        "_id": "123",
                        "_version": 2,
                        "result": "not_found",
                        "_shards": {
                            "total": 2,
                            "successful": 2,
                            "failed": 0
                        },
                        "_seq_no": 44,
                        "_primary_term": 1,
                        "status": 404
                    }
                },
                {
                    "create": {
                        "_index": "demo_users",
                        "_type": "_doc",
                        "_id": "2",
                        "status": 409,
                        "error": {
                            "type": "version_conflict_engine_exception",
                            "reason": "[_doc][2]: version conflict, document already exists (current version [1])",
                            "index_uuid": "u7WE286CQnGqhHeuwW7oyw",
                            "shard": "2",
                            "index": "demo_users"
                        }
                    }
                },
                {
                    "update": {
                        "_index": "demo_users_test",
                        "_type": "_doc",
                        "_id": "1",
                        "_version": 9,
                        "result": "updated",
                        "_shards": {
                            "total": 2,
                            "successful": 2,
                            "failed": 0
                        },
                        "_seq_no": 8,
                        "_primary_term": 1,
                        "status": 200
                    }
                }
            ]
        }
        
      7. mGet【POST _mget 或 index/_mget 或 index/type/_mget ,如果指定了index或type,则请求报文中则无需再指明index或type,可以通过_source指明要查询的include以及要排除exclude的字段】

        POST _mget

        Request Body:

        {
          "docs": [
            {
              "_index": "demo_users",
              "_type": "_doc",
              "_id": "12345"
            },
            {
              "_index": "demo_users",
              "_type": "_doc",
              "_id": "1234567",
              "_source": [
                "userId",
                "username",
                "role"
              ]
            },
            {
              "_index": "demo_users",
              "_type": "_doc",
              "_id": "1234",
              "_source": {
                "include": [
                  "userId",
                  "username"
                ],
                "exclude": [
                  "role"
                ]
              }
            }
          ]
        }
        

        Response Body:

        {
            "docs":[
                {
                    "_index":"demo_users",
                    "_type":"_doc",
                    "_id":"12345",
                    "_version":1,
                    "found":true,
                    "_source":{
                        "userId":1,
                        "username":"张三",
                        "role":"administrator",
                        "enabled":true,
                        "createdDate":"2020-01-01T12:00:00"
                    }
                },
                {
                    "_index":"demo_users",
                    "_type":"_doc",
                    "_id":"1234567",
                    "_version":7,
                    "found":true,
                    "_source":{
                        "role":"administrator",
                        "userId":1,
                        "username":"张三"
                    }
                },
                {
                    "_index":"demo_users",
                    "_type":"_doc",
                    "_id":"1234",
                    "_version":1,
                    "found":true,
                    "_source":{
                        "userId":1,
                        "username":"张三"
                    }
                }
            ]
        }
        

        POST demo_users/_doc/_mget

        Request Body:

        {
          "ids": [
            "1234",
            "12345",
            "123457"
          ]
        }
        

        Response Body:

        {
            "docs":[
                {
                    "_index":"demo_users",
                    "_type":"_doc",
                    "_id":"1234",
                    "_version":1,
                    "found":true,
                    "_source":{
                        "userId":1,
                        "username":"张三",
                        "role":"administrator",
                        "enabled":true,
                        "createdDate":"2020-01-01T12:00:00",
                        "remark":"仅演示"
                    }
                },
                {
                    "_index":"demo_users",
                    "_type":"_doc",
                    "_id":"12345",
                    "_version":1,
                    "found":true,
                    "_source":{
                        "userId":1,
                        "username":"张三",
                        "role":"administrator",
                        "enabled":true,
                        "createdDate":"2020-01-01T12:00:00"
                    }
                },
                {
                    "_index":"demo_users",
                    "_type":"_doc",
                    "_id":"123457",
                    "found":false
                }
            ]
        }
        
      8. _update_by_query根据查询条件更新匹配到的索引文档的指定字段【POST index/_update_by_query 请求体写查询条件以及更新的字段,更新字段这里采用了painless脚本进行灵活更新】

        POST demo_users/_update_by_query

        Request Body:(意思是查询role=administrator【可能大家看到keyword,这是因为role字段为text类型,无法直接匹配,需要借助于子字段role.keyword,如果有不理解后面会有简要说明】,更新role为poweruser、remark为remark+采用_update_by_query更新)

        {
            "script":{ "source":"ctx._source.role=params.role;ctx._source.remark=ctx._source.remark+params.remark",
                "lang":"painless",
                "params":{
                    "role":"poweruser",
                    "remark":"采用_update_by_query更新"
                }
            },
            "query":{
                "term":{
                    "role.keyword":"administrator"
                }
            }
        }
        

        painless写法请具体参考:painless语法教程

        Response Body:

        {
        "took": 114,
        "timed_out": false,
        "total": 6,
        "updated": 6,
        "deleted": 0,
        "batches": 1,
        "version_conflicts": 0,
        "noops": 0,
        "retries": {
        "bulk": 0,
        "search": 0
        },
        "throttled_millis": 0,
        "requests_per_second": -1,
        "throttled_until_millis": 0,
        "failures": [ ]
        }
        
      9. _delete_by_query根据查询条件删除匹配到的索引文档【 POST index/_delete_by_query 请求体写查询匹配条件】

        POST demo_users/_delete_by_query

        Request Body:(意思是查询enabled=false)

        {
          "query": {
            "match": {
              "enabled": false
            }
          }
        }
        

        Response Body:

           {
                   "took":29,
                   "timed_out":false,
                   "total":3,
                   "deleted":3,
                   "batches":1,
                   "version_conflicts":0,
                   "noops":0,
                   "retries":{
                       "bulk":0,
                       "search":0
                   },
                   "throttled_millis":0,
                   "requests_per_second":-1,
                   "throttled_until_millis":0,
                   "failures":[
               
                   ]
              }
        
      10. search查询

        1. URL GET查询(GET index/_search?q=query_string语法,注意中文内容默认分词器是一个汉字拆分成一个term

          
          A.Term Query:【即分词片段(词条)查询,注意这里讲的包含是指与分词片段匹配】
          GET /demo_users/_search?q=role:poweruser //指定字段查询,即:字段包含查询的值
          
          GET /demo_users/_search?q=poweruser //泛查询(没有指定查询的字段),即查询文档中所有字段包含poweruser的值,只要有一个字段符合,那么该文档将会被返回
          
          B.Phrase Query【即分组查询】
          操作符有:AND / OR  / NOT 或者表示为: && / || / ! 
          +表示must -表示must_not 例如:field:(+a -b)意为field中必需包含a但不能包含b
          
          GET /demo_users/_search?q=remark:(POST test) 
          GET /demo_users/_search?q=remark:(POST OR test) 
          GET /demo_users/_search?q=remark:"POST test" 
          //分组查询,即:查询remark中包含POST 或 test的文档记录
          
          GET /demo_users/_search?q=remark:(test AND POST) //remark同时包含test与POST
          GET /demo_users/_search?q=remark:(test NOT POST) //remark包含test但不包含POST
          
          C.范围查询
          区间表示:[]闭区间,{}开区间
          如:year:[2019 TO 2020] 或 {2019 TO 2020} 或 {2019 TO 2020] 或 [* TO 2020]
          算数符号
          year:>2019 或 (>2012 && <=2020) 或 (+>=2012 +<=2020)
          
          GET /demo_users/_search?q=userId:>123 //查询userId字段大于123的文档记录
          
          D.通配符查询
          ?表示匹配任意1个字符,*表示匹配0或多个字符 例如:role:power* , role:use?
          
          GET /demo_users/_search?q=role:power* //查询role字段前面是power,后面可以是0或多个其它任意字符。
          
          可使用正则表达式,如:username:张三d+
          
          可使用近似查询偏移量(slop)提高查询匹配结果【使用~N,N表示偏移量】
          GET /demo_users/_search?q=remark:tett~1 //查询remark中包含test的文档,但实际写成了tett,故使用~1偏移近似查询,可以获得test的查询结果
          
          GET /demo_users/_search?q=remark:"i like shenzhen"~2 //查询i like shenzhen但实际remark字段中值为:i like hubei and shenzhen,比查询值多了 hubei and,这里使用~2指定可偏移相隔2个term(这里即两个单词),最终也是可以查询出结果
          
          
          
        2. DSL POST查询(POST index/_search)

          POST demo_users/_search

          Request Body:

          {
              "query":{
                  "bool":{
                      "must":[
                          {
                              "term":{
                                  "enabled":"true"  #查询enabled=true
                              }
                          },
                          {
                              "term":{
                                  "role.keyword":"poweruser" #且role=poweruser
                              }
                          },
                          {
                              "query_string":{
                                  "default_field":"username.keyword",
                                  "query":"张三" #且 username 包含张三
                              }
                          }
                      ],
                      "must_not":[
          
                      ],
                      "should":[
          
                      ]
                  }
              },
              "from":0,
              "size":1000,
              "sort":[
                  {
                      "createdDate":"desc"  #根据createdDate倒序
                  }
              ],
              "_source":{ #指明返回的字段,includes需返回字段,excludes不需要返回字段
                  "includes":[
                      "role",
                      "username",
                      "userId",
                      "remark"
                  ],
                  "excludes":[
          
                  ]
              }
          }
          
          

    具体用法可参见:

    【Elasticsearch】query_string的各种用法

    Elasticsearch中 match、match_phrase、query_string和term的区别

    Elasticsearch Query DSL 整理总结

    [布尔查询Bool Query]

    最后附上ES官方的API操作链接指引:

    Indices APIs:负责索引Index的创建(create)、删除(delete)、获取(get)、索引存在(exist)等操作。

    Document APIs:负责索引文档的创建(index)、删除(delete)、获取(get)等操作。

    Search APIs:负责索引文档的search(查询),Document APIS根据doc_id进行查询,Search APIs]根据条件查询。

    Aggregations:负责针对索引的文档各维度的聚合(Aggregation)。

    cat APIs:负责查询索引相关的各类信息查询。

    Cluster APIs:负责集群相关的各类信息查询。

  • 相关阅读:
    IEqualityComparer<T> 重写注意事项
    InfoPath使用Sharepoint Webservice之多参数
    强制使用office web Apps新建文档
    SQL 分组取每组第N行数据
    Sharepoint Ribbon 开启右键菜单(此文作废)
    sharepoint 2010 Infopath 备忘
    sharepoint windows认证模式下 限制人员选取器能访问OU
    Unable to load configuration异常处理
    Java数据库连接池的配置
    No Suitable Driver Found 解决方法
  • 原文地址:https://www.cnblogs.com/firebet/p/14038305.html
Copyright © 2011-2022 走看看