Elasticsearch请求体查询

zoukankan html css js c++ java

Elasticsearch请求体查询
前言

在前面的笔记中，记录了Elasticsearch的轻量查询，同时也说明了不推荐轻量查询，这篇笔记主要记录如何使用Elasticsearch请求体查询。

它不仅可以处理自身的查询请求，还允许你对结果进行片段强调（高亮）、对所有或部分结果进行聚合分析，同时还可以给出你是不是想找的建议，这些建议可以引导使用者快速找到他想要的结果。

空查询

空查询，不指定任何参数。将返回所有索引库中的所有文档：
```
GET /_search
{}
```
返回结果：
```
{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 23,
    "successful" : 23,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : ".kibana_1",
        "_type" : "_doc",
        "_id" : "space:default",
        "_score" : 1.0,
        "_source" : {
          "space" : {
            "name" : "Default",
            "description" : "This is your default space!",
            "color" : "#00bfb3",
            "disabledFeatures" : [ ],
            "_reserved" : true
          },
          "type" : "space",
          "references" : [ ],
          "migrationVersion" : {
            "space" : "6.6.0"
          },
          "updated_at" : "2020-12-08T06:47:14.690Z"
        }
      },
      ...
    ]
  }
}
```
查询表达式

查询表达式(Query DSL)是一种非常灵活又富有表现力的查询语言。 Elasticsearch 使用它可以以简单的 JSON 接口来展现 Lucene 功能的绝大部分。在你的应用中，你应该用它来编写你的查询语句。它可以使你的查询语句更灵活、更精确、易读和易调试。

我们可以将查询语句传递给query参数：
```
GET /_search
{
    "query": YOUR_QUERY_HERE
}
```
空查询相当于我们使用match_all查询，匹配所有文档：
```
GET /_search
{
    "query": {
        "match_all": {}
    }
}
```
查询语句结构

一个查询语句的典型结构：
```
{
    QUERY_NAME: {
        ARGUMENT: VALUE,
        ARGUMENT: VALUE,...
    }
}
```
如果是针对某个字段，那么它的结构如下：
```
{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}
```
举个例子，你可以使用 match 查询语句来查询 tweet 字段中包含 elasticsearch 的 tweet：
```
{
    "match": {
        "tweet": "elasticsearch"
    }
}
```
完整的查询请求如下：
```
GET /_search
{
    "query": {
        "match": {
            "tweet": "elasticsearch"
        }
    }
}
```
合并查询语句

查询语句(Query clauses) 就像一些简单的组合块，这些组合块可以彼此之间合并组成更复杂的查询。这些语句可以是如下形式：
- 叶子语句（Leaf clauses） (就像 match 语句) 被用于将查询字符串和一个字段（或者多个字段）对比。
- 复合(Compound) 语句主要用于合并其它查询语句。比如，一个 bool 语句允许在你需要的时候组合其它语句，无论是 must 匹配、 must_not 匹配还是 should 匹配，同时它可以包含不评分的过滤器（filters）：
```
{
    "bool": {
        "must":     { "match": { "tweet": "elasticsearch" }},
        "must_not": { "match": { "name":  "mary" }},
        "should":   { "match": { "tweet": "full text" }},
        "filter":   { "range": { "age" : { "gt" : 30 }} }
    }
}
```
常用查询

虽然 Elasticsearch 自带了很多的查询，但经常用到的也就那么几个，下面简单记录下Elasticsearch常用查询的用法。

match_all

match_all 查询简单的匹配所有文档。在没有指定查询方式时，它是默认的查询：
```
{ "match_all": {}}
```
match

match查询是标准查询，当在精确值字段使用它，它将会精确匹配给定的值。当在一个全文字段上使用match查询，在执行查询前，它将用正确的分析器去分析查询字符串：
```
{ "match": { "age":    26           }}
{ "match": { "date":   "2014-09-01" }}
{ "match": { "public": true         }}
{ "match": { "tag":    "full_text"  }}
```
注意：

对于精确值的查询，建议使用 filter 语句来取代 query，因为 filter 将会被缓存。

multi_match

multi_match 查询可以在多个字段上执行相同的 match 查询：
```
#在title与body字段中查找"full text search"
{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}
```
range

range 查询找出那些落在指定区间内的数字或者时间：
```
{
    "range": {
        "age": {
            "gte":  20,
            "lt":   30
        }
    }
}
```
range操作符有以下几种：
- gt：大于
- gte：大于等于
- lt：小于
- lte：小于等于
term

term 查询被用于精确值匹配，term 查询对于输入的文本不进行分析，所以它将给定的值进行精确查询：
```
{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}
```
terms

terms是term的升级版本，允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件：
```
{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}
```
exists与missing

exists 查询和 missing 查询被用于查找那些指定字段中有值 (exists) 或无值 (missing) 的文档：
```
{
    "exists":   {
        "field":    "title"
    }
}
```
组合查询

前面的都是一些常用的简单查询，但是在实际业务中，一般逻辑不会这么简单。
我们需要用 bool 查询来实现需求。这种查询将多查询组合在一起，成为用户自己想要的布尔查询。它接收以下参数：
- must：文档必须匹配这些条件才能被包含进来。
- must_not：文档必须不匹配这些条件才能被包含进来。
- should：如果满足这些语句中的任意语句，将增加_score，否则，无任何影响。它们主要用于修正每个文档的相关性得分。
- filter：必须匹配，但它以不评分、过滤模式来进行。
下面的查询用于查找 title 字段匹配 how to make millions 并且不被标识为 spam 的文档。那些被标识为 starred 或在2014之后的文档，将比另外那些文档拥有更高的排名。如果两者都满足，那么它排名将更高：
```
{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }},
            { "range": { "date": { "gte": "2014-01-01" }}}
        ]
    }
}
```
注意：如果没有must语句，那么至少需要能够匹配其中的一条should语句。但如果存在至少一条must语句，则对should语句的匹配没有要求。

过滤器

如果我们不想因为文档的时间而影响得分，可以用 filter 语句来重写前面的例子：
```
{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "range": { "date": { "gte": "2014-01-01" }} 
        }
    }
}
```
如果你需要通过多个不同的标准来过滤你的文档，bool 查询本身也可以被用做不评分的查询：
```
{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "bool": { 
              "must": [
                  { "range": { "date": { "gte": "2014-01-01" }}},
                  { "range": { "price": { "lte": 29.99 }}}
              ],
              "must_not": [
                  { "term": { "category": "ebooks" }}
              ]
          }
        }
    }
}
```
constant_score

constant_score它将一个不变的常量评分应用于所有匹配的文，经常用于只需要执行一个 filter 而没有其它查询的情况下：
```
{
    "constant_score":   {
        "filter": {
            "term": { "category": "ebooks" } 
        }
    }
}
```
验证查询

当我们的查询逻辑变得十分复杂的时候，可能需要用到验证查询的功能，它可以自动检测出你的查询语句是否存在问题：
```
GET /index_name/_validate/query?explain
{
   "query": {
      "match" : {
         "name" : "really powerful"
      }
   }
}
```
返回结果：
```
{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "index_name",
      "valid" : true,
      "explanation" : "name:really name:powerful"
    }
  ]
}
```
如果我们的查询语句有问题的话，将会返回错误信息，如下：
```
GET /index_name/_validate/query?explain
{
   "query": {
      "test" : {
         "name" : "really powerful"
      }
   }
}
```
返回信息：
```
{
  "valid" : false,
  "error" : "ParsingException[unknown query [test]]; nested: NamedObjectNotFoundException[[3:16] unknown field [test]];; org.elasticsearch.common.xcontent.NamedObjectNotFoundException: [3:16] unknown field [test]"
}
```
验证解析

在验证的时候，推荐如上一样加上explain参数，这样不管验证是否通过，它将返回详细信息回来。

如上根据返回的错误信息，我们可以快速定位出问题所在。

从 explanation 中可以看出，匹配 really powerful 的 match 查询被重写为两个针对 name 字段的 single-term 查询，一个single-term查询对应查询字符串分出来的一个term。
作者：红雨
出处：https://www.cnblogs.com/52why
微信公众号：红雨python
查看全文

相关阅读:
系统幂等设计
 一文读懂消息队列一些设计
 DDD应对运营活动系统腐化实践
 一文读懂DDD
阿里是如何处理分布式事务的
 核心交易系统架构演进
 系统服务化
 重构系统的套路-写有组织的代码
 数组生成树形结构
 js 对象全等判断

原文地址：https://www.cnblogs.com/52why/p/14431638.html

Elasticsearch请求体查询

前言

空查询

查询表达式

查询语句结构

合并查询语句

常用查询

match_all

match

注意：

multi_match

range

term

terms

exists与missing

组合查询

注意：如果没有must语句，那么至少需要能够匹配其中的一条should语句。但如果存在至少一条must语句，则对should语句的匹配没有要求。

过滤器

constant_score

验证查询

验证解析