zoukankan      html  css  js  c++  java
  • elasticsearchdsl查询

    接续上篇,本篇使用python的elasticsearch-dsl库操作elasticsearch进行查询。

    7.查询

    Elasticsearch是功能非常强大的搜索引擎,使用它的目的就是为了快速的查询到需要的数据。

    查询分类:

    • 基本查询:使用es内置查询条件进行查询
    • 组合查询:把多个查询组合在一起进行复合查询
    • 过滤:查询同时,通过filter条件在不影响打分的情况下筛选数据

    7.1、基本查询

      • 查询前先创建一张表
         1 PUT chaxun
         2 {
         3   "mappings": {
         4     "job":{
         5       "properties": {
         6         "title":{
         7           "store": true,
         8           "type": "text",
         9           "analyzer": "ik_max_word"
        10         },
        11         "company_name":{
        12           "store": true,
        13           "type": "keyword"
        14         },
        15         "desc":{
        16           "type": "text"
        17         },
        18         "comments":{
        19           "type":"integer"
        20         },
        21         "add_time":{
        22           "type":"date",
        23           "format": "yyyy-MM-dd"
        24         }
        25       }
        26     }
        27   }
        28 }

        表截图:

      • match查询
        1 GET chaxun/job/_search
        2 {
        3   "query": {
        4     "match": {
        5       "title": "python"
        6     }
        7   }
        8 }
        1 s = Search(index='chaxun').query('match', title='python')
        2 response = s.execute()
      • term查询 

        term查询不会对查询条件进行解析(分词)

        1 GET chaxun/job/_search
        2 {
        3   "query": {
        4     "term":{
        5       "title":"python爬虫"
        6     }
        7   }
        8 }
        1 s = Search(index='chaxun').query('term', title='python爬虫')
        2 response = s.execute()
      • terms查询
        1 GET chaxun/job/_search
        2 {
        3   "query": {
        4     "terms":{
        5       "title":["工程师", "django", "系统"]
        6     }
        7   }
        8 }
        1 s = Search(index='chaxun').query('terms', title=['django', u'工程师', u'系统'])
        2 response = s.execute()
      • 控制查询的返回数量
         1 GET chaxun/job/_search
         2 {
         3   "query": {
         4     "term":{
         5       "title":"python"
         6     }
         7   },
         8   "from":1,
         9   "size":2
        10 }
        1 s = Search(index='chaxun').query('terms', title=['django', u'工程师', u'系统'])[0:2]
        2 response = s.execute()
      • match_all 查询所有
        1 GET chaxun/job/_search
        2 {
        3   "query": {
        4     "match_all": {}
        5   }
        6 }
        1 s = Search(index='chaxun').query('match_all')
        2 response = s.execute()
      • match_phrase短语查询
         1 GET chaxun/job/_search
         2 {
         3   "query": {
         4     "match_phrase": {
         5       "title": {
         6         "query": "python系统",
         7         "slop": 3
         8       }
         9     }
        10   }
        11 }
        1 s = Search(index='chaxun').query('match_phrase', title={"query": u"elasticsearch引擎", "slop": 3})
        2 response = s.execute()

        注释:将查询条件python系统”分词成[“python”, “系统”],结果需同时满足列表中分词短语,“slop”指定分词词距,匹配结果需不超过slop,比如“python打造推荐引擎系统”,如果slop小于6则无法匹配。

      • multi_match查询
        1 GET chaxun/job/_search
        2 {
        3   "query": {
        4     "multi_match": {
        5       "query": "python",
        6       "fields": ["title^3", "desc"]
        7     }
        8   }
        9 }
        1 q = Q('multi_match', query="python", fields=["title", "desc"])
        2 s = Search(index='chaxun').query(q)
        3 response = s.execute()

        注释:指定查询多个字段,”^3”指定”title”权重是”desc”3倍。

      • 指定返回字段
        1 GET chaxun/job/_search
        2 {
        3   "stored_fields": ["title", "company_name"],
        4   "query": {
        5     "match": {
        6       "title": "python"
        7     }
        8   }
        9 }
        1 s = Search(index='chaxun').query('match', title='python').source(['title', 'company_name'])
        2 response = s.execute()
      • 通过sort对结果排序
         1 GET chaxun/job/_search
         2 {
         3   "query": {
         4     "match_all": {}
         5   },
         6   "sort": [
         7     {
         8       "comments": {
         9         "order": "desc"
        10       }
        11     }
        12   ]
        13 }
        1 s = Search(index='chaxun').query('match_all').sort({"comments": {"order": "desc"}})
        2 response = s.execute()
      • range查询范围
         1 GET chaxun/job/_search
         2 {
         3   "query": {
         4     "range": {
         5       "comments": {
         6         "gte": 10,
         7         "lte": 50,
         8         "boost": 2.0   --权重
         9       }
        10     }
        11   }
        12 }
        1 s = Search(index='chaxun').query('range', comments={"gte": 10, "lte": 50, "boost": 2.0})
        2 response = s.execute()
      • wildcard查询
         1 GET chaxun/job/_search
         2 {
         3   "query": {
         4     "wildcard": {
         5       "title": {
         6         "value": "pyth*n",
         7         "boost": 2
         8       }
         9     }
        10   }
        11 }
        1 s = Search(index='chaxun').query('wildcard', title={"value": "pyth*n", "boost": 2})
        2 response = s.execute()

     7.2、组合查询

      • 新建一张查询表

      • bool查询
    • 格式如下
      1 bool:{
      2     "filter":[],
      3     "must":[],
      4     "should":[],
      5     "must_not":[]
      6 }
      • 最简单的filter查询
        1 select * from testdb where salary=20
         1 GET bool/testdb/_search
         2 {
         3   "query": {
         4     "bool": {
         5       "must": {
         6         "match_all":{}
         7       },
         8      "filter": {
         9         "term":{
        10           "salary":20
        11         }
        12       }
        13     }
        14   }
        15 }
        1 s = Search(index='bool').query('bool', filter=[Q('term', salary=20)])
        2 response = s.execute()
      • 查看分析器解析(分词)的结果
        1 GET _analyze
        2 {
        3   "analyzer": "ik_max_word",
        4   "text": "成都电子科技大学"
        5 }

        注释:”ik_max_word”,精细分词;”ik_smart”,粗略分词

      • bool组合过滤查询
        1 select * from testdb where (salary=20 or title=python) and (salary !=30)
         1 GET bool/testdb/_search
         2 {
         3   "query": {
         4     "bool": {
         5       "should": [
         6         {"term":{"salary":20}},
         7         {"term":{"title":"python"}}
         8       ],
         9       "must_not": [
        10         {"term":{"salary":30}}
        11       ]
        12     }
        13   }
        14 }
        1 q = Q('bool', should=[Q('term', salary=20), Q('term', title='python')],must_not=[Q('term', salary=30)])
        2 response = s.execute()
      • 嵌套查询
        1 select * from testdb where title=python or (title=django and salary=30)
         1 GET bool/testdb/_search
         2 {
         3   "query": {
         4     "bool":{
         5       "should":[
         6         {"term":{"title":"python"}},
         7         {"bool":{
         8           "must":[{"term":{"title":"django"}},
         9                   {"term":{"salary":30}}]
        10         }}
        11       ]
        12     }
        13   }
        14 }
        1 q = Q('bool', should=[Q('term', title='python'), Q('bool', must=[Q('term', title='django'), Q('term', salary=30)])])
        2 s = Search(index='bool').query(q)
        3 response = s.execute()
      • 过滤空和非空
    • 建立测试数据
       1 POST null/testdb2/_bulk
       2 {"index":{"_id":1}}
       3 {"tags":["search"]}
       4 {"index":{"_id":2}}
       5 {"tags":["search", "python"]}
       6 {"index":{"_id":3}}
       7 {"other_field":["some data"]}
       8 {"index":{"_id":4}}
       9 {"tags":null}
      10 {"index":{"_id":5}}
      11 {"tags":["search", null]}
    • 处理null空值的方法
      1 select tags from testdb2 where tags is not NULL
       1 GET null/testdb2/_search
       2 {
       3   "query": {
       4     "bool":{
       5       "filter": {
       6         "exists": {
       7           "field": "tags"
       8         }
       9       }
      10     }
      11   }
      12 }
      1 s = Search(index='null').query('bool', filter={"exists": {"field": "tags"}})
      2 response = s.execute()

    7.3、聚合查询

    未完待续...

  • 相关阅读:
    善用VS中的Code Snippet来提高开发效率
    c#获取远程文件更新时间
    图解VS2008单元测试及查看代码覆盖率
    常用关于 JavaScript 中的跨域访问方法
    Jquery中使用setInterval和setTimeout
    外链图片也有风险吗?
    设计模式学习总结抽象工厂模式(Abstract Factory Pattern)
    斐波拉杰博弈 取石子(五)
    后缀表达式 NYOJ 257
    杭电 1085 Holding BinLaden Captive!
  • 原文地址:https://www.cnblogs.com/dowi/p/10097629.html
Copyright © 2011-2022 走看看