zoukankan      html  css  js  c++  java
  • 常用的Elasticseaerch检索技巧汇总

        本篇博客是对前期工作中遇到ES坑的一些小结,顺手记录下,方便日后查阅。

    0、前言

    为了讲解不同类型ES检索,我们将要对包含以下类型的文档集合进行检索:

    1. title 标题; 
     2. authors 作者; 
     3. summary 摘要; 
     4. release data 发布日期; 
     5. number of reviews 评论数。

    首先,让我们借助 bulk API批量创建新的索引并提交数据。

    PUT /bookdb_index
        { "settings": { "number_of_shards": 1 }}
    
    POST /bookdb_index/book/_bulk
        { "index": { "_id": 1 }}
        { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" }
        { "index": { "_id": 2 }}
        { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }
        { "index": { "_id": 3 }}
        { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }
        { "index": { "_id": 4 }}

    1、基本匹配检索( Basic Match Query)

    1.1 全文检索

    有两种方式可以执行全文检索: 
    1)使用包含参数的检索API,参数作为URL的一部分。

    举例:以下对”guide”执行全文检索。

    GET /bookdb_index/book/_search?q=guide
    [Results]
    "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.28168046, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.24144039, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ]

    2)使用完整的ES DSL,其中Json body作为请求体。 
    其执行结果如方式1)结果一致。

    {
        "query": {
            "multi_match" : {
                "query" : "guide",
                "fields" : ["_all"]
            }
        }
    }

    解读:使用multi_match关键字代替match关键字,作为对多个字段运行相同查询的方便的简写方式。 fields属性指定要查询的字段,在这种情况下,我们要对文档中的所有字段进行查询。

    1.2 指定特定字段检索

    这两个API也允许您指定要搜索的字段。 例如,要在标题字段中搜索带有“in action”字样的图书, 
    1)URL检索方式 
    如下所示:

    GET /bookdb_index/book/_search?q=title:in action
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.6259885,
            "_source": {
              "title": "Solr in Action",
              "authors": [
                "trey grainger",
                "timothy potter"
              ],
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "publish_date": "2014-04-05",
              "num_reviews": 23,
              "publisher": "manning"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.5975345,
            "_source": {
              "title": "Elasticsearch in Action",
              "authors": [
                "radu gheorge",
                "matthew lee hinman",
                "roy russo"
              ],
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "publish_date": "2015-12-03",
              "num_reviews": 18,
              "publisher": "manning"
            }
          }
        ]

    2)DSL检索方式 
    然而,full body的DSL为您提供了创建更复杂查询的更多灵活性(我们将在后面看到)以及指定您希望的返回结果。 在下面的示例中,我们指定要返回的结果数、偏移量(对分页有用)、我们要返回的文档字段以及属性的高亮显示。 
    结果数的表示方式:size; 
    偏移值的表示方式:from; 
    指定返回字段 的表示方式 :_source; 
    高亮显示 的表示方式 :highliaght。

    POST /bookdb_index/book/_search
    {
        "query": {
            "match" : {
                "title" : "in action"
            }
        },
        "size": 2,
        "from": 0,
        "_source": [ "title", "summary", "publish_date" ],
        "highlight": {
            "fields" : {
                "title" : {}
            }
        }
    }
    
    
    [Results]
    "hits": {
        "total": 2,
        "max_score": 0.9105287,
        "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.9105287,
            "_source": {
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "title": "Elasticsearch in Action",
              "publish_date": "2015-12-03"
            },
            "highlight": {
              "title": [
                "Elasticsearch <em>in</em> <em>Action</em>"
              ]
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.9105287,
            "_source": {
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "title": "Solr in Action",
              "publish_date": "2014-04-05"
            },
            "highlight": {
              "title": [
                "Solr <em>in</em> <em>Action</em>"
              ]
            }
          }
        ]
      }

    注意:对于 multi-word 检索,匹配查询允许您指定是否使用‘and’运算符, 

    而不是使用默认’or’运算符。 
    您还可以指定minimum_should_match选项来调整返回结果的相关性。 
    详细信息可以在Elasticsearch指南中查询Elasticsearch guide. 获取。

    2、多字段检索 (Multi-field Search)

    如我们已经看到的,要在搜索中查询多个文档字段(例如在标题和摘要中搜索相同的查询字符串),请使用multi_match查询。

    POST /bookdb_index/book/_search
    {
        "query": {
            "multi_match" : {
                "query" : "elasticsearch guide",
                "fields": ["title", "summary"]
            }
        }
    }
    
    
    [Results]
    "hits": {
        "total": 3,
        "max_score": 0.9448582,
        "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "1",
            "_score": 0.9448582,
            "_source": {
              "title": "Elasticsearch: The Definitive Guide",
              "authors": [
                "clinton gormley",
                "zachary tong"
              ],
              "summary": "A distibuted real-time search and analytics engine",
              "publish_date": "2015-02-07",
              "num_reviews": 20,
              "publisher": "manning"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.17312013,
            "_source": {
              "title": "Elasticsearch in Action",
              "authors": [
                "radu gheorge",
                "matthew lee hinman",
                "roy russo"
              ],
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "publish_date": "2015-12-03",
              "num_reviews": 18,
              "publisher": "manning"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.14965448,
            "_source": {
              "title": "Solr in Action",
              "authors": [
                "trey grainger",
                "timothy potter"
              ],
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "publish_date": "2014-04-05",
              "num_reviews": 23,
              "publisher": "manning"
            }
          }
        ]
      }

    注意:以上结果3匹配的原因是guide在summary存在。

    3、 Boosting提升某字段得分的检索( Boosting)

    由于我们正在多个字段进行搜索,我们可能希望提高某一字段的得分。 在下面的例子中,我们将“摘要”字段的得分提高了3倍,以增加“摘要”字段的重要性,从而提高文档 4 的相关性。

    POST /bookdb_index/book/_search
    {
        "query": {
            "multi_match" : {
                "query" : "elasticsearch guide",
                "fields": ["title", "summary^3"]
            }
        },
        "_source": ["title", "summary", "publish_date"]
    }
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "1",
            "_score": 0.31495273,
            "_source": {
              "summary": "A distibuted real-time search and analytics engine",
              "title": "Elasticsearch: The Definitive Guide",
              "publish_date": "2015-02-07"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.14965448,
            "_source": {
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "title": "Solr in Action",
              "publish_date": "2014-04-05"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.13094766,
            "_source": {
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "title": "Elasticsearch in Action",
              "publish_date": "2015-12-03"
            }
          }
        ]

    注意:Boosting不仅意味着计算得分乘法以增加因子。 实际的提升得分值是通过归一化和一些内部优化。参考 Elasticsearch guide.查看更多。

    4、Bool检索( Bool Query)

    可以使用AND / OR / NOT运算符来微调我们的搜索查询,以提供更相关或指定的搜索结果。

    在搜索API中是通过bool查询来实现的。 
    bool查询接受”must”参数(等效于AND),一个must_not参数(相当于NOT)或者一个should参数(等同于OR)。

    例如,如果我想在标题中搜索一本名为“Elasticsearch”或“Solr”的书,AND由“clinton gormley”创作,但NOT由“radu gheorge”创作:

    POST /bookdb_index/book/_search
    {
        "query": {
            "bool": {
                "must": {
                    "bool" : { "should": [
                          { "match": { "title": "Elasticsearch" }},
                          { "match": { "title": "Solr" }} ] }
                },
                "must": { "match": { "authors": "clinton gormely" }},
                "must_not": { "match": {"authors": "radu gheorge" }}
            }
        }
    }
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "1",
            "_score": 0.3672021,
            "_source": {
              "title": "Elasticsearch: The Definitive Guide",
              "authors": [
                "clinton gormley",
                "zachary tong"
              ],
              "summary": "A distibuted real-time search and analytics engine",
              "publish_date": "2015-02-07",
              "num_reviews": 20,
              "publisher": "oreilly"
            }
          }
        ]

    注意:您可以看到,bool查询可以包含任何其他查询类型,包括其他布尔查询,以创建任意复杂或深度嵌套的查询。

    5、 Fuzzy 模糊检索( Fuzzy Queries)

    在 Match检索 和多匹配检索中可以启用模糊匹配来捕捉拼写错误。 基于与原始词的Levenshtein距离来指定模糊度。

    POST /bookdb_index/book/_search
    {
        "query": {
            "multi_match" : {
                "query" : "comprihensiv guide",
                "fields": ["title", "summary"],
                "fuzziness": "AUTO"
            }
        },
        "_source": ["title", "summary", "publish_date"],
        "size": 1
    }
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.5961596,
            "_source": {
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "title": "Solr in Action",
              "publish_date": "2014-04-05"
            }
          }
        ]

    “AUTO”的模糊值相当于当字段长度大于5时指定值2。但是,设置80%的拼写错误的编辑距离为1,将模糊度设置为1可能会提高整体搜索性能。 有关更多信息, Typos and Misspellingsch 。

    https://blog.csdn.net/laoyang360/article/details/76769208 从6开始

  • 相关阅读:
    数组和对象常用方法汇总
    基于vue的悬浮碰撞窗口(用于打广告的)组件
    时间的基本处理
    防抖动和节流阀
    A. 配置xftp和xshell来远程管理Linux服务器
    课堂练习-找水王
    评价软件
    构建之法阅读笔记02
    学习进度条博客11
    用户场景
  • 原文地址:https://www.cnblogs.com/pyspark/p/8817707.html
Copyright © 2011-2022 走看看