zoukankan      html  css  js  c++  java
  • (转)通过HTTP RESTful API 操作elasticsearch搜索数据

    样例数据集

    这是编造的JSON格式银行客户账号信息文档,文档schema如下: 

    “account_number”: 0, 
    “balance”: 16623, 
    “firstname”: “Bradshaw”, 
    “lastname”: “Mckenzie”, 
    “age”: 29, 
    “gender”: “F”, 
    “address”: “244 Columbus Place”, 
    “employer”: “Euron”, 
    “email”: “bradshawmckenzie@euron.com”, 
    “city”: “Hobucken”, 
    “state”: “CO” 

    这些数据可以通过www.json-generator.com网站生成


    加载样例数据集

    下载样例数据集链接 
    解压数据到指定目录,然后加载到elasticsearch集群

    绝对路径:
    curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@/home/cluster/apps/elasticsearch/elasticsearch-1.7.2/test/accounts.json"
    
    相对路径:
    curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@test/accounts.json"
    • 1
    • 2
    • 3
    • 4
    • 5

    curl 'localhost:9200/_cat/indices?v'
    结果:
    health status index              pri rep docs.count docs.deleted store.size pri.store.size 
    yellow open   bank                 5   1       1000            0    417.1kb        417.1kb 
    • 1
    • 2
    • 3
    • 4

    上面结果,说明我们成功bulk 1000个文档到bank索引中了


    搜索数据API

    有两种方式:一种方式是通过 REST 请求 URI ,发送搜索参数;另一种是通过REST 请求体,发送搜索参数。而请求体允许你包含更容易表达和可阅读的JSON格式。

    • 通过 REST 请求 URI
    curl 'localhost:9200/bank/_search?q=*&pretty'
    
    结果:
    {
      "took" : 63,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1000,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "1",
          "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "6",
          "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
        }, {
          "_index" : "bank",
          "_type" : "account",
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    q=*,参数告诉elasticsearch,在bank索引中匹配所有的文档 
    pretty,参数告诉elasticsearch,返回形式打印JSON结果

    • 通过REST 请求体:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} }
    }'
    
    结果:
    {
      "took" : 26,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1000,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "1",
          "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "6",
          "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "13",
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31

    与第一种方式不同是在URI中替代传递q=*,使用POST方式提交,请求体包含JSON格式搜索


    介绍查询语言

    elasticsearch提供JSON格式领域特定语言执行查询。可参考Query DSL

    {
      "query": { "match_all": {} }
    }
    • 1
    • 2
    • 3

    query:告诉我们定义查询 
    match_all:运行简单类型查询指定索引中的所有文档

    除了指定查询参数,还可以指定其他参数来影响最终的结果。

    • match_all & 只返回第一个文档:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "size": 1
    }'
    结果:
    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1000,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "4",
          "_score" : 1.0,
          "_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"rodriquezflores@tourmania.com","city":"Eastvale","state":"HI"}
        } ]
      }
    }
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26

    如果不指定size,默认是返回10条文档信息


    • match_all & 返回11到20个文档信息
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "from": 10,
      "size": 10
    }'
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    from:指定文档索引从哪里开始,默认从0开始 
    size:从from开始,返回多个文档 
    这feature在实现分页查询很有用


    • match_all and 根据account balance 降序排序 & 返回10个文档(默认10个)
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "sort": { "balance": { "order": "desc" } }
    }'
    • 1
    • 2
    • 3
    • 4
    • 5

    执行搜索

    默认的,我们搜索返回完整的JSON文档。而source(_source字段搜索点击量)。如果我们不想返回完整的JSON文档,我们可以使用source返回指定字段。

    • 返回 account_number and balance:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_all": {} },
      "_source": ["account_number", "balance"]
    }'
    • 1
    • 2
    • 3
    • 4
    • 5

    这样操作有点类似于SQL SELECT FROM field lis


    match 查询,可作为基本字段搜索查询 
    - 返回 account_number=20:

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match": { "account_number": 20 } }
    }'
    • 1
    • 2
    • 3
    • 4

    • 返回 address=mill:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match": { "address": "mill" } }
    }'
    • 1
    • 2
    • 3
    • 4

    • 返回 address=mill or address=lane:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match": { "address": "mill lane" } }
    }'
    • 1
    • 2
    • 3
    • 4

    • 返回 短语匹配 address=mill lane:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": { "match_phrase": { "address": "mill lane" } }
    }'
    • 1
    • 2
    • 3
    • 4

    布尔值(bool)查询

    • 返回 匹配address=mill & address=lane:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }'
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    must:要求所有条件都要满足(类似于&&)


    • 返回 匹配address=mill or address=lane:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "should": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }'
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    should:任何一个满足就可以(类似于||)


    • 返回 不匹配address=mill & address=lane:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "must_not": [
            { "match": { "address": "mill" } },
            { "match": { "address": "lane" } }
          ]
        }
      }
    }'
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    must_not:所有条件都不能满足(类似于! (&&))


    • 返回 age=40 & state!=ID
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "bool": {
          "must": [
            { "match": { "age": "40" } }
          ],
          "must_not": [
            { "match": { "state": "ID" } }
          ]
        }
      }
    }'
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    执行过滤器

    文档中score(_score字段是搜索结果)。score是一个数字型的,是一种相对方法匹配查询文档结果。分数越高,搜索关键字与该文档相关性越高;越低,搜索关键字与该文档相关性越低。

    在elasticsearch中所有的搜索都会触发相关性分数计算。如果我们不使用相关性分数计算,那要使用另一种查询能力,构建过滤器

    过滤器是类似于查询的概念,除了得以优化,更快的执行速度的两个主要原因: 
    1. 过滤器不计算得分,所以他们比执行查询的速度 
    2. 过滤器可缓存在内存中,允许重复搜索

    为了便于理解过滤器,先介绍过滤器搜索(like match_all, match, bool, etc.),可以与其他的普通查询搜索组合一个过滤器。 
    range filter,允许我们通过一个范围值来过滤文档,一般用于数字或日期过滤

    使用过滤器搜索返回 balances[ 20000,30000]。换句话说,balance>=20000 && balance<=30000

    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "query": {
        "filtered": {
          "query": { "match_all": {} },
          "filter": {
            "range": {
              "balance": {
                "gte": 20000,
                "lte": 30000
              }
            }
          }
        }
      }
    }'
    
    结果:
    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 217,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "4",
          "_score" : 1.0,
          "_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"rodriquezflores@tourmania.com","city":"Eastvale","state":"HI"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "9",
          "_score" : 1.0,
          "_source":{"account_number":9,"balance":24776,"firstname":"Opal","lastname":"Meadows","age":39,"gender":"M","address":"963 Neptune Avenue","employer":"Cedward","email":"opalmeadows@cedward.com","city":"Olney","state":"OH"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "11",
          "_score" : 1.0,
          "_source":{"account_number":11,"balance":20203,"firstname":"Jenkins","lastname":"Haney","age":20,"gender":"M","address":"740 Ferry Place","employer":"Qimonk","email":"jenkinshaney@qimonk.com","city":"Steinhatchee","state":"GA"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "42",
          "_score" : 1.0,
    "_source":{"account_number":42,"balance":21137,"firstname":"Harding","lastname":"Hobbs","age":26,"gender":"F","address":"474 Ridgewood Place","employer":"Xth","email":"hardinghobbs@xth.com","city":"Heil","state":"ND"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "54",
          "_score" : 1.0,
          "_source":{"account_number":54,"balance":23406,"firstname":"Angel","lastname":"Mann","age":22,"gender":"F","address":"229 Ferris Street","employer":"Amtas","email":"angelmann@amtas.com","city":"Calverton","state":"WA"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "66",
          "_score" : 1.0,
          "_source":{"account_number":66,"balance":25939,"firstname":"Franks","lastname":"Salinas","age":28,"gender":"M","address":"437 Hamilton Walk","employer":"Cowtown","email":"frankssalinas@cowtown.com","city":"Chase","state":"VT"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "92",
          "_score" : 1.0,
          "_source":{"account_number":92,"balance":26753,"firstname":"Gay","lastname":"Brewer","age":34,"gender":"M","address":"369 Ditmars Street","employer":"Savvy","email":"gaybrewer@savvy.com","city":"Moquino","state":"HI"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "100",
          "_score" : 1.0,
          "_source":{"account_number":100,"balance":29869,"firstname":"Madden","lastname":"Woods","age":32,"gender":"F","address":"696 Ryder Avenue","employer":"Slumberia","email":"maddenwoods@slumberia.com","city":"Deercroft","state":"ME"}
        }, {
          "_index" : "bank",
          "_type" : "account",
          "_id" : "105",
          "_score" : 1.0,
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83

    过滤查询包含match_all查询(查询部分)和一系列过滤(过滤部分)。可以代替任何其他查询到查询部分以及其他过滤器过滤部分。在上述情况下,过滤器范围智能,因为文档落入range所有匹配“平等”,即。比另一个更相关,没有文档。

    一般情况,最明智的方式决定是否使用filter or query,就看你是否关心相关性分数。如果相关性不重要,那就使用filter,否则就使用query。 
    queries and filters很类似于关系型数据库中的 “SELECT WHERE clause”


    执行聚合

    聚合提供从你的数据中分组和提取统计能力。 
    类似于关系型数据中的SQL GROUP BY和SQL 聚合函数。

    在Elasticsearch,你有能力执行搜索返回命中结果,同时拆分命中结果,然后统一返回结果。当你使用简单的API运行搜索和多个聚合,然后返回所有结果避免网络带宽过大的情况是高效的。

    • 根据state分组,降序统计top 10 state
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state"
          }
        }
      }
    }'
    
    结果:
     "hits" : {
        "total" : 1000,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "aggregations" : {
        "group_by_state" : {
          "buckets" : [ {
            "key" : "al",
            "doc_count" : 21
          }, {
            "key" : "tx",
            "doc_count" : 17
          }, {
            "key" : "id",
            "doc_count" : 15
          }, {
            "key" : "ma",
            "doc_count" : 15
          }, {
            "key" : "md",
            "doc_count" : 15
          }, {
            "key" : "pa",
            "doc_count" : 15
          }, {
            "key" : "dc",
            "doc_count" : 14
          }, {
            "key" : "me",
            "doc_count" : 14
          }, {
            "key" : "mo",
            "doc_count" : 14
          }, {
            "key" : "nd",
            "doc_count" : 14
          } ]
        }
      }
    }
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54

    类似于关系型数据库

    SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
    • 1

    size=0 不是展示搜索结果命中数,因为我只是想要看聚合结果


    • 根据state计算账户平均balance,降序统计top 10 state
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }'
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18

    注意嵌套average_balance聚合group_by_state内聚合。这是一个常见的模式,所有的聚合。您可以嵌套内聚合聚合任意提取旋转汇总时,你需要从你的数据。


    • 降序排序平均 balance:
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state",
            "order": {
              "average_balance": "desc"
            }
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }'
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21

    • 聚合年龄分区间(ages 20-29, 30-39, and 40-49),聚合性别,最后平均balance 展示最终结果
    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
    {
      "size": 0,
      "aggs": {
        "group_by_age": {
          "range": {
            "field": "age",
            "ranges": [
              {
                "from": 20,
                "to": 30
              },
              {
                "from": 30,
                "to": 40
              },
              {
                "from": 40,
                "to": 50
              }
            ]
          },
          "aggs": {
            "group_by_gender": {
              "terms": {
                "field": "gender"
              },
              "aggs": {
                "average_balance": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            }
          }
        }
      }
    }'
  • 相关阅读:
    经典回溯问题--八皇后dfs递归回溯求解【DFS】
    CSP认证考试(第九次)第一题
    C++字符串和数字格式转化(使用sprintf()和sscanf()函数)
    2016蓝桥杯C++A组第六题 寒假作业【暴力搜索】
    先序非递归建立二叉树
    sqlsrv数据库复杂语句1
    tp5域名配置
    JavaScript使用 value 属性
    数据库随机查询6条数据
    文件目录问题
  • 原文地址:https://www.cnblogs.com/youngerger/p/9030183.html
Copyright © 2011-2022 走看看