zoukankan      html  css  js  c++  java
  • Elasticsearch复杂搜索(排序、分页、高亮、模糊查询、精确查询)

    如果不了解Es的基本使用,可以查看之前的文章。Elasticsearch 索引及文档的基本操作

    在查询之前可以使用Bulk API 批量插入文档数据 数据来源

    查询数据

    match query

    match会使用分词器解析!先分析文档,然后再通过分析的文档进行查询。

    GET /student/_search
    {
      "query": {
        "match": {
          "name": "山西"
        }
      }
    }
    

    上面的搜索也可以这么实现

    GET /student/_search?q=name:"山西"
    

    查询结果展示有三个名字中包含 “山西” 的学生:

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 0.7133499,
        "hits" : [
          {
            "_index" : "student",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.7133499,
            "_source" : {
              "name" : "山西太原-张三",
              "age" : "23",
              "address" : {
                "city" : "太原",
                "province" : "山西"
              }
            }
          },
          {
            "_index" : "student",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 0.7133499,
            "_source" : {
              "name" : "山西长治-李四",
              "age" : "24",
              "address" : {
                "city" : "长治",
                "province" : "山西"
              }
            }
          },
          {
            "_index" : "student",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 0.7133499,
            "_source" : {
              "name" : "山西吕梁-王五",
              "age" : "25",
              "address" : {
                "city" : "吕梁",
                "province" : "山西"
              }
            }
          }
        ]
      }
    }
    

    描述

    query : 表示查询。

    match : 要匹配的条件信息。

    name :要查询的信息

    hits --> total

    • value : 查询出两条数据
    • ralation : 关系是 eq,相等

    max_source : 最大分值

    hits : 索引和文档的信息,查询出来的结果总数,就是查询出来的具体文档。

    我们可以根据每个文档的 _source 来判断那条数据更加符合预期结果。

    在使用mutch查询时,默认的操作是 OR,下面两个查询的结果是相同的:

    GET student/_search
    {
        "query": {
            "match": {
                "name": {
                    "query": "山西长治",
                    "operator": "or"
                }
            }
        }
    }
    
    GET student/_search
    {
        "query": {
            "match": {
                "name": "山西长治"
            }
        }
    }
    

    因为在使用mutch操作时,operator 默认值为 OR,上面的查询为只要任何文档匹配 :山西长治 其中任何一个字将被显示。

    可以通过设置 minimum_should_match 参数来设置至少匹配的term,比如:

    GET student/_search
    {
        "query": {
            "match": {
                "name": {
                    "query": "山西长治",
                    "operator": "or",
                    "minimum_should_match": 3
                }
            }
        }
    }
    

    只有匹配到 山西长治 这四个字其中的三个字的文档才会被显示。

    改为 and 之后,只有一个文档会被查询到:

    GET student/_search
    {
      "query": {
        "match": {
          "name": {
            "query": "山西长治",
            "operator": "and"
          }
        }
      }
    }
    

    Ids query

    使用多个id批量查询文档

    GET student/_search
    {
      "query": {
        "ids": {
          "values": [1,2,3]
        }
      }
    }
    

    上面的查询将返回 id 为 1,2,3的文档。

    multi_match

    multi_match 查询建立在 match 查询的基础上,允许多字段查询。

    在上面的搜索中,通过指定一个 field 来进行搜索。在很多情况下,并不知道那个 field 含有要查询的关键字,这种情况就可以使用 multi_match 来查询。

    GET student/_search
    {
        "query": {
            "multi_match": {
                "query": "山西长治",
                "fields": [
                    "name",
                    "address.city^3",
                    "address.province"
                ],
                "type": "best_fields"
            }
        }
    }
    

    将field:name、city、province 进行检索,并对 city 中含有 山西长治 的文档的分数进行三倍加权。返回结果为:

    {
        "took" : 0,
        "timed_out" : false,
        "_shards" : {
            "total" : 1,
            "successful" : 1,
            "skipped" : 0,
            "failed" : 0
        },
        "hits" : {
            "total" : {
                "value" : 3,
                "relation" : "eq"
            },
            "max_score" : 7.223837,
            "hits" : [
                {
                    "_index" : "student",
                    "_type" : "_doc",
                    "_id" : "2",
                    "_score" : 7.223837,
                    "_source" : {
                        "name" : "山西长治-李四",
                        "age" : "24",
                        "address" : {
                            "city" : "长治",
                            "province" : "山西"
                        }
                    }
                },
                {
                    "_index" : "student",
                    "_type" : "_doc",
                    "_id" : "1",
                    "_score" : 0.7133499,
                    "_source" : {
                        "name" : "山西太原-张三",
                        "age" : "23",
                        "address" : {
                            "city" : "太原",
                            "province" : "山西"
                        }
                    }
                },
                {
                    "_index" : "student",
                    "_type" : "_doc",
                    "_id" : "3",
                    "_score" : 0.7133499,
                    "_source" : {
                        "name" : "山西吕梁-王五",
                        "age" : "25",
                        "address" : {
                            "city" : "吕梁",
                            "province" : "山西"
                        }
                    }
                }
            ]
        }
    }
    

    Prefix query

    返回在提供的字段中返回包含特定前缀的文档

    GET student/_search
    {
        "query": {
            "prefix": {
                "address.city": {
                    "value": "吕"
                }
            }
        }
    }
    

    查询城市开头为 的文档

    {
        "took" : 2,
        "timed_out" : false,
        "_shards" : {
            "total" : 1,
            "successful" : 1,
            "skipped" : 0,
            "failed" : 0
        },
        "hits" : {
            "total" : {
                "value" : 1,
                "relation" : "eq"
            },
            "max_score" : 1.0,
            "hits" : [
                {
                    "_index" : "student",
                    "_type" : "_doc",
                    "_id" : "3",
                    "_score" : 1.0,
                    "_source" : {
                        "name" : "山西吕梁-王五",
                        "age" : "25",
                        "address" : {
                            "city" : "吕梁",
                            "province" : "山西"
                        }
                    }
                }
            ]
        }
    }
    

    Term query

    term 会在给定字段中进行精确的字段匹配,因此需要提供准确的查询条件以获取正确的结果

    GET /student/_search
    {
        "query": {
            "term": {
                "name.keyword": "山西太原-张三"
            }
        }
    }
    

    这里使用 name.keyword 来对 "山西太原-张三" 这个条件进行精确查询匹配文档:

    {
        "took" : 0,
        "timed_out" : false,
        "_shards" : {
            "total" : 1,
            "successful" : 1,
            "skipped" : 0,
            "failed" : 0
        },
        "hits" : {
            "total" : {
                "value" : 1,
                "relation" : "eq"
            },
            "max_score" : 1.2039728,
            "hits" : [
                {
                    "_index" : "student",
                    "_type" : "_doc",
                    "_id" : "1",
                    "_score" : 1.2039728,
                    "_source" : {
                        "name" : "山西太原-张三",
                        "age" : "23",
                        "address" : {
                            "city" : "太原",
                            "province" : "山西"
                        }
                    }
                }
            ]
        }
    }
    

    Terms query

    如果想用对个值进行精确查询,可以使用terms进行查询。类似于 SQL中的 in 语法

    GET student/_search
    {
        "query": {
            "terms": {
                "address.city.keyword": [
                    "长治",
                    "广州"
                ]
            }
        }
    }
    

    上面的查询结果将展示 address.city.keyword 里含有 长治和广州 的所有文档。

    复合查询

    复合查询是将上面的单个查询组合起来形成更复杂的查询。

    一般格式为:

    POST _search
    {
        "query": {
            "bool" : {
                "must" : {
                    "term" : { "user" : "kimchy" }
                },
                "filter": {
                    "term" : { "tag" : "tech" }
                },
                "must_not" : {
                    "range" : {
                        "age" : { "gte" : 10, "lte" : 20 }
                    }
                },
                "should" : [
                    { "term" : { "tag" : "wow" } },
                    { "term" : { "tag" : "elasticsearch" } }
                ],
                "minimum_should_match" : 1,
                "boost" : 1.0
            }
        }
    }
    

    复合查询是由 bool 下面的 must filter must_not should 组成,并且可以通过 minimum_should_match 来指定文档必须匹配的数量或者百分比。如果布尔查询包含至少一个 should 子句,并且没有 must 或 filter 子句,则默认值为1。否则,默认值为0。

    must

    must 相当于SQL中的 and 操作。

    使用复合查询城市为长治,年龄为24的文档数据

    GET student/_search
    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "address.city": "长治"
                        }
                    },
                    {
                        "match": {
                            "age": "24"
                        }
                    }
                ]
            }
        }
    }
    

    must_not

    查询所有省份不在山西的文档,返回结果只剩下了一个广州:

    GET student/_search
    {
        "query": {
            "bool": {
                "must_not": [
                    {
                        "match": {
                            "address.province": "山西"
                        }
                    }
                ]
            }
        }
    }
    

    filter

    使用filter过滤年龄在24~25之间的文档

    GET student/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "age": {
                  "gte": 24,
                  "lte": 25
                }
              }
            }
          ]
        }
      }
    }
    
    • gt : 大于
    • gte : 大于等于
    • lt:小于
    • lte:小于等于

    should

    should 表示或的意思,相当于SQL中的 OR。

    查询省份在山西的文档,如果name含有张三,相关性会更高,搜索结果会靠前。

    GET student/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "address.province": "山西"
              }
            }
          ],
          "should": [
            {
              "match_phrase": {
                "name": "李四"
              }
            }
          ]
        }
      }
    }
    

    返回结果可以看到 name为 山西长治-李四 的文档排在最前:

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 3.1212955,
        "hits" : [
          {
            "_index" : "student",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 3.1212955,
            "_source" : {
              "name" : "山西长治-李四",
              "age" : "24",
              "address" : {
                "city" : "长治",
                "province" : "山西"
              }
            }
          },
          {
            "_index" : "student",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.7133499,
            "_source" : {
              "name" : "山西太原-张三",
              "age" : "23",
              "address" : {
                "city" : "太原",
                "province" : "山西"
              }
            }
          },
          {
            "_index" : "student",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 0.7133499,
            "_source" : {
              "name" : "山西吕梁-王五",
              "age" : "25",
              "address" : {
                "city" : "吕梁",
                "province" : "山西"
              }
            }
          }
        ]
      }
    }
    

    通配符查询

    使用 wildcard 查询一个字符串中包含的字符,相当于SQL中的 like

    GET student/_search
    {
        "query": {
            "wildcard": {
                "name": {
                    "value": "*王"
                }
            }
        }
    }
    

    查询结果为:

    {
        "took" : 0,
        "timed_out" : false,
        "_shards" : {
            "total" : 1,
            "successful" : 1,
            "skipped" : 0,
            "failed" : 0
        },
        "hits" : {
            "total" : {
                "value" : 1,
                "relation" : "eq"
            },
            "max_score" : 1.0,
            "hits" : [
                {
                    "_index" : "student",
                    "_type" : "_doc",
                    "_id" : "3",
                    "_score" : 1.0,
                    "_source" : {
                        "name" : "山西吕梁-王五",
                        "age" : "25",
                        "address" : {
                            "city" : "吕梁",
                            "province" : "山西"
                        }
                    }
                }
            ]
        }
    }
    

    分页及排序

    查询省份为山西的文档,按照年龄倒序排列并分页展示

    GET student/_search
    {
        "query": {
            "match": {
                "address.province": "山西"
            }
        },
        "sort": [
            {
                "age.keyword": {
                    "order": "desc"
                }
            }
        ],
        "from": 2,
        "size": 2
    }
    

    from : 起始页,下标从0开始。

    size : 每页显示多少条

    高亮查询

    使用 highlight 高亮查询并且自定义高亮字段。并通过 pre_tagspost_tags 修改高亮文本前后缀。

    GET student/_search
    {
        "query": {
            "match": {
                "name": "张三"
            }
        },
        "highlight": {
            "pre_tags": "<br>", 
            "post_tags": "</br>", 
            "fields": {
                "name": {}
            }
        }
    }
    

    返回结果

    {
        "took" : 0,
        "timed_out" : false,
        "_shards" : {
            "total" : 1,
            "successful" : 1,
            "skipped" : 0,
            "failed" : 0
        },
        "hits" : {
            "total" : {
                "value" : 1,
                "relation" : "eq"
            },
            "max_score" : 2.4079456,
            "hits" : [
                {
                    "_index" : "student",
                    "_type" : "_doc",
                    "_id" : "1",
                    "_score" : 2.4079456,
                    "_source" : {
                        "name" : "山西太原-张三",
                        "age" : 23,
                        "address" : {
                            "city" : "太原",
                            "province" : "山西"
                        }
                    },
                    "highlight" : {
                        "name" : [
                            "山西太原-<br>张</br><br>三</br>"
                        ]
                    }
                }
            ]
        }
    }
    
    If you’re going to reuse code, you need to understand that code!
  • 相关阅读:
    angularjs学习笔记—工具方法
    js日期格式转换的相关问题探讨
    vue路由原理剖析
    如何减少UI设计师产品与前端工程师的沟通成本
    前端优化带来的思考,浅谈前端工程化
    前端入门方法
    自写juqery插件实现左右循环滚动效果图
    前端大综合
    前端收集
    如何在代码中减少if else语句的使用
  • 原文地址:https://www.cnblogs.com/leizzige/p/14790672.html
Copyright © 2011-2022 走看看