zoukankan      html  css  js  c++  java
  • es使用term+filter查询(对type为text的查询注意点)

    插入测试数据

    PUT /forum/article/_bulk
    { "index": { "_id": 1 }}
    { "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
    { "index": { "_id": 2 }}
    { "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
    { "index": { "_id": 3 }}
    { "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
    { "index": { "_id": 4 }}
    { "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }

    查看生成的mapping:

    GET /forum/_mapping/article

    结果(articleID除了显示type外,还有一个fields显示):

    type=text,默认会设置两个field,一个是field本身,比如articleID就是分词的;还有一个就是field.keyword(这里是articleID.keyword),这个字段默认是不分词的,并且最多保留256字符
    {
      "forum": {
        "mappings": {
          "article": {
            "properties": {
              "articleID": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "hidden": {
                "type": "boolean"
              },
              "postDate": {
                "type": "date"
              },
              "userID": {
                "type": "long"
              }
            }
          }
        }
      }
    }

    查询id为2的精确匹配

    GET /forum/article/_search
    {
      "query": {
        "constant_score": {
          "filter": {
            "term": {
              "userID": "1"
            }
          },
          "boost": 1.2
        }
      }
    }

    constant_score:返回确切的得分

    query+constant_score+filter+term:查找

    term和terms的区别:terms是term的复数形式,用法 "terms": {"userID": ["1","2"]},term精确匹配一个,而terms是精确匹配多个值。

    查询articleID

    GET /forum/article/_search
    {
      "query": {
        "constant_score": {
          "filter": {
            "term": {
              "articleID":"XHDK-A-1293-#fJ3"
            }
          },
          "boost": 1.2
        }
      }
    }

    引用语句:结果为空。因为articleID.keyword,是ES最新版本内置建立的field,就是不分词的。所以一个articleID过来的时候,会建立两次索引。一次是自己本身(articleID),是要分词的,分词后放入倒排索引;另一次是基于articleID.keyword,不分词,最多保留256字符,直接一个完整的字符串放入倒排索引中。

    所以term filter,对text过滤,可以考虑使用内置的field.keyword来进行匹配。但是有个问题,默认就保留256字符,所以尽可能还是自己去手动建立索引,指定not_analyzed吧,在最新版本的es中,不需要指定not_analyzed也可以,将type=keyword即可。

    自己的理解:term是精确查找,去找XHDK-A-1293-#fJ3

                        问题是创建索引的时候,默认对text进行分词后简历索引。所以查询不到。

                        但是keyword是未被分词后索引,索引这种查找能查询出来。

    解决方法:

    GET /forum/article/_search
    {
      "query": {
        "constant_score": {
          "filter": {
            "term": {
              "articleID.keyword":"XHDK-A-1293-#fJ3"
            }
          },
          "boost": 1.2
        }
      }
    }

    更深度的理解,分词后的索引:

    GET /forum/_analyze
    {
      "field": "articleID",
      "text": "XHDK-A-1293-#fJ3"
    }

    结果:

    {
      "tokens": [
        {
          "token": "xhdk",
          "start_offset": 0,
          "end_offset": 4,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "a",
          "start_offset": 5,
          "end_offset": 6,
          "type": "<ALPHANUM>",
          "position": 1
        },
        {
          "token": "1293",
          "start_offset": 7,
          "end_offset": 11,
          "type": "<NUM>",
          "position": 2
        },
        {
          "token": "fj3",
          "start_offset": 13,
          "end_offset": 16,
          "type": "<ALPHANUM>",
          "position": 3
        }
      ]
    }

    多条件查询一:select * from forum.article where (post_date='2017-01-01' or article_id='XHDK-A-1293-#fJ3') and post_date!='2017-01-02'

    GET /forum/article/_search
    {
      "query": {
        "constant_score": {
          "filter": {
            "bool": {
              "should" : [
                {"term" : {"postDate" : "2017-01-01"}},
                {"term" : {"articleID.keyword" : "XHDK-A-1293-#fJ3"}}
              ],
              "must_not" : {
                "term" : {"postDate" : "2017-01-02"}
              }
            }
          }
        }
      }
    }

    多条件查询二:select * from forum.article where article_id='XHDK-A-1293-#fJ3' or (article_id='JODL-X-1937-#pV7' and post_date='2017-01-01')

    GET /forum/article/_search
    {
      "query": {
        "constant_score": {
          "filter": {
            "bool": {
              "should" : [
                {
                  "term" : {"articleID.keyword" : "XHDK-A-1293-#fJ3"}
                },
                {
                  "bool" : {
                    "must" : [
                      {"term" : {"articleID.keyword" : "JODL-X-1937-#pV7"}},
                      {"term" : {"postDate" : "2017-01-01"}}
                    ]
                  }
                }
              ]
            }   
          }
        }
      }
    }

    参考文献:https://www.jianshu.com/p/e1430282378d

  • 相关阅读:
    c#扩展函数
    c# 正则匹配对称括号
    sqllocaldb 2016安装
    scrapy图片数据爬取
    Scrapy爬取全站数据并存储到数据库和文件中
    Scrapy基于终端指令的持久化存储
    nginx指定配置文件
    腾讯云安装python36
    Django部署腾讯云服务时候报错:SQLite 3.8.3 or later is required (found 3.7.17)
    flask打包下载zip文件
  • 原文地址:https://www.cnblogs.com/parent-absent-son/p/11063765.html
Copyright © 2011-2022 走看看