zoukankan      html  css  js  c++  java
  • 常用的Elasticseaerch检索技巧汇总

        本篇博客是对前期工作中遇到ES坑的一些小结,顺手记录下,方便日后查阅。

    0、前言

    为了讲解不同类型ES检索,我们将要对包含以下类型的文档集合进行检索:

    1. title 标题; 
     2. authors 作者; 
     3. summary 摘要; 
     4. release data 发布日期; 
     5. number of reviews 评论数。

    首先,让我们借助 bulk API批量创建新的索引并提交数据。

    PUT /bookdb_index
        { "settings": { "number_of_shards": 1 }}
    
    POST /bookdb_index/book/_bulk
        { "index": { "_id": 1 }}
        { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" }
        { "index": { "_id": 2 }}
        { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }
        { "index": { "_id": 3 }}
        { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }
        { "index": { "_id": 4 }}

    1、基本匹配检索( Basic Match Query)

    1.1 全文检索

    有两种方式可以执行全文检索: 
    1)使用包含参数的检索API,参数作为URL的一部分。

    举例:以下对”guide”执行全文检索。

    GET /bookdb_index/book/_search?q=guide
    [Results]
    "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.28168046, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.24144039, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ]

    2)使用完整的ES DSL,其中Json body作为请求体。 
    其执行结果如方式1)结果一致。

    {
        "query": {
            "multi_match" : {
                "query" : "guide",
                "fields" : ["_all"]
            }
        }
    }

    解读:使用multi_match关键字代替match关键字,作为对多个字段运行相同查询的方便的简写方式。 fields属性指定要查询的字段,在这种情况下,我们要对文档中的所有字段进行查询。

    1.2 指定特定字段检索

    这两个API也允许您指定要搜索的字段。 例如,要在标题字段中搜索带有“in action”字样的图书, 
    1)URL检索方式 
    如下所示:

    GET /bookdb_index/book/_search?q=title:in action
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.6259885,
            "_source": {
              "title": "Solr in Action",
              "authors": [
                "trey grainger",
                "timothy potter"
              ],
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "publish_date": "2014-04-05",
              "num_reviews": 23,
              "publisher": "manning"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.5975345,
            "_source": {
              "title": "Elasticsearch in Action",
              "authors": [
                "radu gheorge",
                "matthew lee hinman",
                "roy russo"
              ],
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "publish_date": "2015-12-03",
              "num_reviews": 18,
              "publisher": "manning"
            }
          }
        ]

    2)DSL检索方式 
    然而,full body的DSL为您提供了创建更复杂查询的更多灵活性(我们将在后面看到)以及指定您希望的返回结果。 在下面的示例中,我们指定要返回的结果数、偏移量(对分页有用)、我们要返回的文档字段以及属性的高亮显示。 
    结果数的表示方式:size; 
    偏移值的表示方式:from; 
    指定返回字段 的表示方式 :_source; 
    高亮显示 的表示方式 :highliaght。

    POST /bookdb_index/book/_search
    {
        "query": {
            "match" : {
                "title" : "in action"
            }
        },
        "size": 2,
        "from": 0,
        "_source": [ "title", "summary", "publish_date" ],
        "highlight": {
            "fields" : {
                "title" : {}
            }
        }
    }
    
    
    [Results]
    "hits": {
        "total": 2,
        "max_score": 0.9105287,
        "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.9105287,
            "_source": {
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "title": "Elasticsearch in Action",
              "publish_date": "2015-12-03"
            },
            "highlight": {
              "title": [
                "Elasticsearch <em>in</em> <em>Action</em>"
              ]
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.9105287,
            "_source": {
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "title": "Solr in Action",
              "publish_date": "2014-04-05"
            },
            "highlight": {
              "title": [
                "Solr <em>in</em> <em>Action</em>"
              ]
            }
          }
        ]
      }

    注意:对于 multi-word 检索,匹配查询允许您指定是否使用‘and’运算符, 

    而不是使用默认’or’运算符。 
    您还可以指定minimum_should_match选项来调整返回结果的相关性。 
    详细信息可以在Elasticsearch指南中查询Elasticsearch guide. 获取。

    2、多字段检索 (Multi-field Search)

    如我们已经看到的,要在搜索中查询多个文档字段(例如在标题和摘要中搜索相同的查询字符串),请使用multi_match查询。

    POST /bookdb_index/book/_search
    {
        "query": {
            "multi_match" : {
                "query" : "elasticsearch guide",
                "fields": ["title", "summary"]
            }
        }
    }
    
    
    [Results]
    "hits": {
        "total": 3,
        "max_score": 0.9448582,
        "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "1",
            "_score": 0.9448582,
            "_source": {
              "title": "Elasticsearch: The Definitive Guide",
              "authors": [
                "clinton gormley",
                "zachary tong"
              ],
              "summary": "A distibuted real-time search and analytics engine",
              "publish_date": "2015-02-07",
              "num_reviews": 20,
              "publisher": "manning"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.17312013,
            "_source": {
              "title": "Elasticsearch in Action",
              "authors": [
                "radu gheorge",
                "matthew lee hinman",
                "roy russo"
              ],
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "publish_date": "2015-12-03",
              "num_reviews": 18,
              "publisher": "manning"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.14965448,
            "_source": {
              "title": "Solr in Action",
              "authors": [
                "trey grainger",
                "timothy potter"
              ],
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "publish_date": "2014-04-05",
              "num_reviews": 23,
              "publisher": "manning"
            }
          }
        ]
      }

    注意:以上结果3匹配的原因是guide在summary存在。

    3、 Boosting提升某字段得分的检索( Boosting)

    由于我们正在多个字段进行搜索,我们可能希望提高某一字段的得分。 在下面的例子中,我们将“摘要”字段的得分提高了3倍,以增加“摘要”字段的重要性,从而提高文档 4 的相关性。

    POST /bookdb_index/book/_search
    {
        "query": {
            "multi_match" : {
                "query" : "elasticsearch guide",
                "fields": ["title", "summary^3"]
            }
        },
        "_source": ["title", "summary", "publish_date"]
    }
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "1",
            "_score": 0.31495273,
            "_source": {
              "summary": "A distibuted real-time search and analytics engine",
              "title": "Elasticsearch: The Definitive Guide",
              "publish_date": "2015-02-07"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.14965448,
            "_source": {
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "title": "Solr in Action",
              "publish_date": "2014-04-05"
            }
          },
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "3",
            "_score": 0.13094766,
            "_source": {
              "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
              "title": "Elasticsearch in Action",
              "publish_date": "2015-12-03"
            }
          }
        ]

    注意:Boosting不仅意味着计算得分乘法以增加因子。 实际的提升得分值是通过归一化和一些内部优化。参考 Elasticsearch guide.查看更多。

    4、Bool检索( Bool Query)

    可以使用AND / OR / NOT运算符来微调我们的搜索查询,以提供更相关或指定的搜索结果。

    在搜索API中是通过bool查询来实现的。 
    bool查询接受”must”参数(等效于AND),一个must_not参数(相当于NOT)或者一个should参数(等同于OR)。

    例如,如果我想在标题中搜索一本名为“Elasticsearch”或“Solr”的书,AND由“clinton gormley”创作,但NOT由“radu gheorge”创作:

    POST /bookdb_index/book/_search
    {
        "query": {
            "bool": {
                "must": {
                    "bool" : { "should": [
                          { "match": { "title": "Elasticsearch" }},
                          { "match": { "title": "Solr" }} ] }
                },
                "must": { "match": { "authors": "clinton gormely" }},
                "must_not": { "match": {"authors": "radu gheorge" }}
            }
        }
    }
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "1",
            "_score": 0.3672021,
            "_source": {
              "title": "Elasticsearch: The Definitive Guide",
              "authors": [
                "clinton gormley",
                "zachary tong"
              ],
              "summary": "A distibuted real-time search and analytics engine",
              "publish_date": "2015-02-07",
              "num_reviews": 20,
              "publisher": "oreilly"
            }
          }
        ]

    注意:您可以看到,bool查询可以包含任何其他查询类型,包括其他布尔查询,以创建任意复杂或深度嵌套的查询。

    5、 Fuzzy 模糊检索( Fuzzy Queries)

    在 Match检索 和多匹配检索中可以启用模糊匹配来捕捉拼写错误。 基于与原始词的Levenshtein距离来指定模糊度。

    POST /bookdb_index/book/_search
    {
        "query": {
            "multi_match" : {
                "query" : "comprihensiv guide",
                "fields": ["title", "summary"],
                "fuzziness": "AUTO"
            }
        },
        "_source": ["title", "summary", "publish_date"],
        "size": 1
    }
    
    
    [Results]
    "hits": [
          {
            "_index": "bookdb_index",
            "_type": "book",
            "_id": "4",
            "_score": 0.5961596,
            "_source": {
              "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
              "title": "Solr in Action",
              "publish_date": "2014-04-05"
            }
          }
        ]

    “AUTO”的模糊值相当于当字段长度大于5时指定值2。但是,设置80%的拼写错误的编辑距离为1,将模糊度设置为1可能会提高整体搜索性能。 有关更多信息, Typos and Misspellingsch 。

    https://blog.csdn.net/laoyang360/article/details/76769208 从6开始

  • 相关阅读:
    Oracle 按一行里某个字段里的值分割成多行进行展示
    Property or method "openPageOffice" is not defined on the instance but referenced during render. Make sure that this property is reactive, either in the data option, or for class-based components, by
    SpringBoot 项目启动 Failed to convert value of type 'java.lang.String' to required type 'cn.com.goldenwater.dcproj.dao.TacPageOfficePblmListDao';
    Maven 设置阿里镜像
    JS 日期格式化,留作参考
    JS 过滤数组里对象的某个属性
    原生JS实现简单富文本编辑器2
    Chrome控制台使用详解
    android权限(permission)大全
    不借助第三方网站四步实现手机网站转安卓APP
  • 原文地址:https://www.cnblogs.com/pyspark/p/8817707.html
Copyright © 2011-2022 走看看