zoukankan      html  css  js  c++  java
  • ElasticSearch 查询命令



    查询方式

    ES 有自己的不同于 SQL 的查询语法,也提供了 JDBC 等包可以执行相应的 SQL

    这里的例子用的是 ES 自己的查询语法

    插入数据

    curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
      "name": "Wang",
      "title": "software designer",
      "age": 35,
      "address": {"city": "guangzhou", "district": "tianhe"},
      "content": "I want to do some AI machine learning works"
    }'
    

    会自动创建 my_index, _doc, 以及各个 field

    查看 index

    curl localhost:9200/my_index?pretty
    
    
    {
      "my_index" : {
        "aliases" : { },
        "mappings" : {
          "_doc" : {
            "properties" : {
              "address" : {
                "properties" : {
                  "city" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "district" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  }
                }
              },
              "age" : {
                "type" : "long"
              },
              "content" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "name" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "title" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          }
        },
        "settings" : {
          "index" : {
            "creation_date" : "1628737053188",
            "number_of_shards" : "5",
            "number_of_replicas" : "1",
            "uuid" : "-NHgaqt4R_SQs2KHd0aJwQ",
            "version" : {
              "created" : "6050199"
            },
            "provided_name" : "my_index"
          }
        }
      }
    }
    

    会列出 setting 和 mapping

    term 查询

    要求完全匹配,即查询条件不分词 (数据默认是按分词索引)

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "term":{
            "content":"machine"
         }
       }
    }'
    

    能查到结果

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "term":{
            "content":"machine learning"
         }
       }
    }'
    

    不能查到结果

    因为查询条件 "machine learning" 必须完全匹配,但数据是按分词索引的,没有 "machine learning" 这个分词

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "term":{
            "content.keyword":"machine"
         }
       }
    }'
    

    不能查到结果

    keyword 代表不查分词数据,而是查原数据,原数据不和 "machine" 完全匹配

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "term":{
            "content.keyword":"I want to do some AI machine learning works"
         }
       }
    }'
    

    能查到结果

    查询条件和原数据完全匹配

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "term":{
            "address.city":"guangzhou"
         }
       }
    }'
    

    能查到嵌套字段的结果

    分词大小写

    貌似要使用小写查询,可能因为 es 默认将分词都转换成小写

    terms 查询

    要求多个词中的任意一个能完全匹配

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "terms":{
            "content": ["machine", "learning"]
         }
       }
    }'
    

    能查到结果,分词 machine 和 learning 都能匹配上

    hits 结果

    匹配的每个记录会在 hits 字段中打出来

      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 1.9932617,
        "hits" : [
          {
            "_index" : "my_index",
            "_type" : "_doc",
            "_id" : "anbGPnsBSeHJMrQRVMQK",
            "_score" : 1.9932617,
            "_source" : {
              "name" : "Wang",
              "title" : "software designer",
              "age" : 35,
              "address" : {
                "city" : "guangzhou",
                "district" : "tianhe"
              },
              "content" : "I want to do some AI machine learning works",
              "kpi" : 3.2,
              "date" : "2021-01-01T08:00:00Z"
            }
          }
        ]
      }
    

    具体数据在 hits -> hits -> _source

    count 查询

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_count?pretty -d '{
      "query": {
         "term":{
            "title": "software"
         }
       }
    }'
    

    返回

      "count" : 2,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      }
    

    只返回 count 结果

    string 查询

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "query_string":{
            "query": "(machine learning) AND (K8S works)",
            "default_field": "content"
         }
       }
    }'
    

    (machine learning) 和 (K8S works) 会被拆成两组分词, 只有同时匹配两组分词中的任意一个的,才能匹配上

    match 查询

    分词匹配,即查询条件会被做分词处理,并且任一分词满足即可

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match":{
            "content": "machine learning"
         }
       }
    }'
    

    能查到结果,两个分词都匹配

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match":{
            "content": "works machine AI"
         }
       }
    }'
    

    能查到结果,所有分词都匹配,并且和顺序无关

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match":{
            "content": "machine factory"
         }
       }
    }'
    

    能查到结果,有一个分词即 machine 满足即可

    match_phrase 查询

    查询条件会被当成一个完整的词汇对待,原数据包含这个词汇才匹配
    (对比 term 则是原数据和查询词汇完全一样才匹配,match_phrase 是包含的关系)

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match_phrase":{
            "content": "machine factory"
         }
       }
    }'
    

    不能查到结果,因为 machine factory 不匹配

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match_phrase":{
            "content": "works machine AI"
         }
       }
    }'
    

    不能查到结果,虽然原数据包含这三个分词,但 match_phrase 是把 works machine AI 当成一个完整的单词对待

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match_phrase":{
            "content": "AI machine learning"
         }
       }
    }'
    

    能查到结果,因为 AI machine learning 有作为完整连续的单词出现

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match_phrase":{
            "content": {
               "query": "some AI works",
               "slop" : 2
            }
         }
       }
    }'
    

    能查到结果

    虽然 some AI works 作为一个完整的单词没有出现,但 slop 2 表示如果最多跳过两个分词就能满足的话也算匹配上,这里跳过 machine learning

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match_phrase":{
            "content": {
               "query": "some AI works",
               "slop" : 1
            }
         }
       }
    }'
    

    不能查到结果,只跳过一个分词依然无法匹配上

    multi_match

    对多个字段进行 match 查询,有一个字段满足的话就算匹配上

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "multi_match":{
            "query": "machine learning",
            "fields" : ["title", "content"]
         }
       }
    }'
    

    能查到结果,content 满足查询条件

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "multi_match":{
            "query": "designer",
            "fields" : ["title", "content"]
         }
       }
    }'
    

    能查到结果,title 满足查询条件

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "multi_match":{
            "query": "manager",
            "fields" : ["title", "content"]
         }
       }
    }'
    

    不能查到结果,title 和 content 都不满足查询条件

    match_all 查询

    返回所有文档

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "match_all":{}
       }
    }'
    

    不指定条件

    bool 查询

    联合查询,多个条件同时满足才匹配

    每个条件可以是 must, filter, should, must_not

    must: 必须满足 must 子句的条件,并且参与计算分值
    filter: 必须满足 filter 子句的条件,不参与计算分值
    should: 至少满足 should 子句的一个或多个条件(由 minimum_should_match 参数决定),参与计算分值
    must_not: 必须不满足 must_not 定义的条件

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "bool":{
            "must": [
                {
                    "term": {"address.city": "guangzhou"}
                },
                {
                    "match": {"content": "machine learning"}
                }
            ],
            "must_not": {
                "range": {"age": {"gt": 35}}
            },
            "filter": {
                "match": {"title": "designer"}
            },
            "should": [
                {
                    "term": {"name": "Li"}
                },
                {
                    "match_phrase": {"content": "AI machine"}
                }
            ],
            "minimum_should_match" : 1
         }
       }
    }'
    

    能查到结果,因为 bool 下的所有条件都能满足

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "bool":{
            "must": [
                {
                    "term": {"address.city": "guangzhou"}
                },
                {
                    "match": {"content": "machine learning"}
                }
            ],
            "must_not": {
                "range": {"age": {"from": 35, "to": 40}}
            },
            "filter": {
                "match": {"title": "designer"}
            },
            "should": [
                {
                    "term": {"name": "Li"}
                },
                {
                    "match_phrase": {"content": "AI machine"}
                }
            ],
            "minimum_should_match" : 1
         }
       }
    }'
    

    不能查到结果,因为 must_not 条件不满足


    bool 可以嵌套

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "bool":{
            "must": [
                {
                    "bool": {
                        "should": [
                            {
                                "term": {"name": "Li"}
                            },
                            {
                                "match_phrase": {"content": "AI machine"}
                            }
                        ]
                    }
                },
                {
                    "bool": {
                        "filter": {
                            "match": {"title": "designer"}
                        }
                    }
                }
            ]
         }
       }
    }'
    

    must 里面是多个 bool 查询

    控制查询返回数

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "from":0,
      "size":2,
      "query": {
         "term":{
            "title":"designer"
         }
       }
    }'
    

    从第一个开始,最多返回 2 个

    控制返回字段

    就像 SQL 的 select 选择特定字段一样

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "_source":["name","age"],
      "query": {
         "term":{
            "title":"designer"
         }
       }
    }'
    

    只返回匹配文档的 name 和 age 字段

    排序

    指定排序字段

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "_source":["name","age"],
      "query": {
         "term":{
            "title":"designer"
         }
      },
      "sort": [
         {
             "age": {"order": "desc"}
         }
      ]
    }'
    

    结果按 age 的降序排

    范围查询

    支持 from, to, gte, lte, gt, lt 等等,比如

    {
        "query": {
            "range": {
                "date": {
                    "gte": "2021-08-01",
                    "lte": "2021-08-02",
                    "relation": "within",
                    "format": "yyyy-MM-dd"
                }
            }
        }
    }
    

    relation 也可以是 CONTAINS, INTERSECTS (默认)

    因为 date 字段可以是一个范围,比如

    "date": {"gte":"2021-08-01","lte":"2021-08-03"}
    

    within 表示 date 的范围在 range 的范围内
    contains 表示 date 的范围包含了 range 的范围
    intersects 表示 date 的范围和 range 的范围有交叉

    可以通过 format 指定日期格式

    通配符查询

    支持 * 和 ?

    * 代表 0 个或多个字符
    ? 代表任意一个字符

    模糊查询

    查询类似的单词

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "query": {
         "fuzzy":{
            "title":"desinger"
         }
       }
    }'
    

    虽然 desinger 写错了,但还是能查到

    分值

    查询结果会有 _score 字段表示该文档和查询条件的相关度

    决定分值的因素包括

    词频: 在文档中出现的次数越多,权重越高
    逆向文档频率: 单词在所有文档中出现的次数越多,权重越低,比如 and/the 等词汇
    文档长度: 文档越长权重越高

    aggregation 查询

    插入更多数据

    curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
      "name": "Wang",
      "title": "software designer",
      "age": 35,
      "address": {"city": "guangzhou", "district": "tianhe"},
      "content": "I want to do some AI machine learning works",
      "kpi": 3.2,
      "date": "2021-01-01T08:00:00Z"
    }'
    
    curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
      "name": "Li",
      "title": "senior software designer",
      "age": 30,
      "address": {"city": "guangzhou", "district": "tianhe"},
      "content": "I want to do some K8S works",
      "kpi": 4.0,
      "date": "2021-01-01T10:00:00Z"
    }'
    
    curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
      "name": "Zhang",
      "title": "Test Engineer",
      "age": 25,
      "address": {"city": "guangzhou", "district": "tianhe"},
      "content": "I want to do some auto-test works",
      "kpi": 4.5,
      "date": "2021-06-01T09:00:00Z"
    }'
    

    Aggregation 查询包括以下几类

    Bucket aggregations: 统计每个分组的记录的数量
    Metrics aggregations: 统计每个分组的记录的平均值/最大值/等等
    Pipeline aggregations: 对 agg 的结果再做进一步计算

    下面举出部门 agg 操作的例子

    所有的 agg 操作参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

    metrics aggregation : avg

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "query": {
         "match":{
            "title": "designer Engineer"
         }
      },
      "aggs": {
         "kpi_avg": {
            "avg": { 
               "field": "kpi",
               "missing": 3.5
            } 
         }
      }
    }'
    

    aggs 是关键字,写成 aggregations 也可以
    kpi_avg 是自定义名字,会在结果中出现
    avg 是关键字,表示要做 avg 操作,field 指定要做 avg 的字段,missing 表示如果字段不存在的话要使用的默认值

    如果不指定 query 就是对所有数据做 agg

    如果不指定 size 为 0,除了打出 agg 的结果,还会把匹配的数据都打出来
    指定了 size 为 0 后,就只打出 agg 的结果

      "aggregations" : {
        "kpi_avg" : {
          "value" : 3.900000015894572
        }
      }
    

    可以一次指定多个 aggs 查询

    metrics aggregation : avg & histogram

    如果数据是 histogram 类型 (需要创建 index 时指定)

    curl -X PUT 'http://localhost:9200/my_index_histogram' -H 'Content-Type: application/json' -d '{
      "mappings" : {
        "properties" : {
          "my_histogram" : {
            "type" : "histogram"
          }
        }
      }
    }'
    
    curl -X POST 'http://localhost:9200/my_index_histogram/_doc'  -H 'Content-Type: application/json' -d '{
      "name": "Zhao",
      "title": "manager",
      "age": 30,
      "address": {"city": "guangzhou", "district": "tianhe"},
      "my_histogram": {
          "values" : [3.5, 4.0, 4.5], 
          "counts" : [1, 2, 3] 
      }
    }'
    

    avg 处理是 (3.51 + 4.02 + 4.5*3) / (1+2+3)

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index_histogram/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "score_avg": {
            "avg": { 
               "field": "my_histogram"
            } 
         }
      }
    }'
    

    结果

      "aggregations" : {
        "score_avg" : {
          "value" : 4.166666666666667
        }
      }
    

    默认自动创建的 index 字段不是 histogram 的

    metrics aggregation : max/min/sum

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "kpi_avg": {
            "max": { 
               "field": "kpi"
            } 
         }
      }
    }'
    
    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "kpi_avg": {
            "min": { 
               "field": "kpi"
            } 
         }
      }
    }'
    
    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "kpi_avg": {
            "sum": { 
               "field": "kpi"
            } 
         }
      }
    }'
    

    计算 min/max/sum 等等

    metrics aggregation : boxplot

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "query": {
         "match":{
            "title": "designer Engineer"
         }
      },
      "aggs": {
         "kpi_avg": {
            "boxplot": { 
               "field": "kpi",
               "missing": 3.5
            }
         }
      }
    }'
    

    结果

      "aggregations" : {
        "kpi_avg" : {
          "min" : 3.200000047683716,
          "max" : 4.5,
          "q1" : 3.400000035762787,
          "q2" : 4.0,
          "q3" : 4.375,
          "lower" : 3.200000047683716,
          "upper" : 4.5
        }
      }
    

    箱型图

    q1 : 下四分位数 (25%)
    q2 : 中位数 (50%)
    q3 : 上四分位数 (75%)
    lower : 不小于 q1-1.5(q3-q1) 的值中的最小值
    upper : 不大于 q3+1.5
    (q3-q1) 的值中的最大值
    min : 最小值
    max : 最大值

    metrics aggregation : cardinality

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "query": {
         "match":{
            "title": "designer Engineer"
         }
      },
      "aggs": {
         "kpi_count": {
            "cardinality": { 
               "field": "kpi"
            }
         }
      }
    }'
    

    结果

      "aggregations" : {
        "kpi_count" : {
          "value" : 3
        }
      }
    

    相当于 count(distinct field)

    metrics aggregation : extended_stats

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "query": {
         "match":{
            "title": "designer Engineer"
         }
      },
      "aggs": {
         "kpi_stats": {
            "extended_stats": { 
               "field": "kpi"
            }
         }
      }
    }'
    

    结果

      "aggregations" : {
        "kpi_stats" : {
          "count" : 3,
          "min" : 3.200000047683716,
          "max" : 4.5,
          "avg" : 3.900000015894572,
          "sum" : 11.700000047683716,
          "sum_of_squares" : 46.49000030517578,
          "variance" : 0.28666664441426565,
          "variance_population" : 0.28666664441426565,
          "variance_sampling" : 0.4299999666213985,
          "std_deviation" : 0.5354125926930237,
          "std_deviation_population" : 0.5354125926930237,
          "std_deviation_sampling" : 0.6557438269792545,
          "std_deviation_bounds" : {
            "upper" : 4.970825201280619,
            "lower" : 2.8291748305085243,
            "upper_population" : 4.970825201280619,
            "lower_population" : 2.8291748305085243,
            "upper_sampling" : 5.2114876698530805,
            "lower_sampling" : 2.588512361936063
          }
        }
      }
    

    各种统计结果

    metrics aggregation : geo

    创建 geo 类型的数据

    curl -X PUT 'http://localhost:9200/my_index_geo' -H 'Content-Type: application/json' -d '{
      "mappings" : {
        "properties" : {
          "location" : {
            "type" : "geo_point"
          }
        }
      }
    }'
    
    curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
      "location": "52.374081,4.912350"
    }'
    
    curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
      "location": "52.369219,4.901618"
    }'
    
    curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
      "location": "52.371667,4.914722"
    }'
    
    curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
      "location": "51.222900,4.405200"
    }'
    

    可以获取中心点

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index_geo/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "centroid": {
            "geo_centroid": { 
               "field": "location"
            }
         }
      }
    }'
    

    结果

      "aggregations" : {
        "centroid" : {
          "location" : {
            "lat" : 52.08446673466824,
            "lon" : 4.783472470007837
          },
          "count" : 4
        }
      }
    

    还可以获取边界等等

    metrics aggregation : matrix_stats

    计算均值/方差/均差/相关数,等等

    bucket aggregation : adjacency_matrix

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "people_group": {
            "adjacency_matrix": { 
               "filters": {
                  "grpA" : { "terms" : { "title" : ["designer", "engineer"] }},
                  "grpB" : { "terms" : { "title" : ["senior", "software"] }},
                  "grpC" : { "terms" : { "content" : ["test", "pv"] }}
               }
            }
         }
      }
    }'
    

    结果

      "aggregations" : {
        "people_group" : {
          "buckets" : [
            {
              "key" : "grpA",
              "doc_count" : 3
            },
            {
              "key" : "grpA&grpB",
              "doc_count" : 2
            },
            {
              "key" : "grpA&grpC",
              "doc_count" : 1
            },
            {
              "key" : "grpB",
              "doc_count" : 2
            },
            {
              "key" : "grpC",
              "doc_count" : 1
            }
          ]
        }
      }
    

    按照 filters 条件计算每个分组的数量

    bucket aggregation : composite

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "composite_group": {
            "composite": { 
               "sources": [
                  {"title" : { "terms" : { "field" : "title.keyword"}}},
                  {"age" : { "terms" : { "field" : "age"}}}
               ]
            }
         }
      }
    }'
    

    必须使用 keyword 这样不分词,否则聚合不了

    结果

      "aggregations" : {
        "composite_group" : {
          "after_key" : {
            "title" : "software designer",
            "age" : 35
          },
          "buckets" : [
            {
              "key" : {
                "title" : "Test Engineer",
                "age" : 25
              },
              "doc_count" : 1
            },
            {
              "key" : {
                "title" : "senior software designer",
                "age" : 30
              },
              "doc_count" : 1
            },
            {
              "key" : {
                "title" : "software designer",
                "age" : 35
              },
              "doc_count" : 1
            }
          ]
        }
      }
    

    可以看到,结果类似于

    select 
      field_a, field_b, count(*)
    group by
      field_a, field_b
    

    除了 terms 还可以是 Histogram、Date histogram、GeoTile grid

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "composite": {
            "sources": [
              {
                "date": {
                  "date_histogram": {
                    "field": "date",
                    "calendar_interval": "1d"
                  }
                }
              }
            ]
          }
        }
      }
    }'
    

    比如这个 Date histogram 分组的时候, 把 date 字段精确到天然后按天分组

      "aggregations" : {
        "my_buckets" : {
          "after_key" : {
            "date" : 1622505600000
          },
          "buckets" : [
            {
              "key" : {
                "date" : 1609459200000
              },
              "doc_count" : 2
            },
            {
              "key" : {
                "date" : 1622505600000
              },
              "doc_count" : 1
            }
          ]
        }
      }
    

    可以通过 format 字段指定日期格式

    bucket aggregation : composite - after_key

    用于分页查询

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "composite_group": {
            "composite": { 
               "size": 2,
               "sources": [
                  {"title" : { "terms" : { "field" : "title.keyword"}}},
                  {"age" : { "terms" : { "field" : "age"}}}
               ]
            }
         }
      }
    }'
    

    指定只返回两个聚合

    返回如下

      "aggregations" : {
        "composite_group" : {
          "after_key" : {
            "title" : "senior software designer",
            "age" : 30
          },
          "buckets" : [
            {
              "key" : {
                "title" : "Test Engineer",
                "age" : 25
              },
              "doc_count" : 1
            },
            {
              "key" : {
                "title" : "senior software designer",
                "age" : 30
              },
              "doc_count" : 1
            }
          ]
        }
      }
    

    只有两个记录,同时返回的结果里有 after_key 表示当次返回的最后一个记录

    下次查询的时候把这个 after_key 的内容带上

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
         "composite_group": {
            "composite": { 
               "size": 2,
               "sources": [
                  {"title" : { "terms" : { "field" : "title.keyword"}}},
                  {"age" : { "terms" : { "field" : "age"}}}
               ],
               "after": {
                  "title" : "senior software designer",
                  "age" : 30
               }
            }
         }
      }
    }'
    

    这样就会继续返回接下来的两个记录

    bucket aggregation : date_histogram/auto_date_histogram

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "date_histogram": {
             "field": "date",
             "calendar_interval": "1d"
          }
        }
      }
    }'
    

    按天统计,但这里会从最小值到最大值,这个例子是从 2021-01-01 到 2021-06-01 每天出一个 bucket 哪怕是 0

    只能是 1d,要指定多天的必须用 fixed_interval

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "date_histogram": {
             "field": "date",
             "fixed_interval": "2d"
          }
        }
      }
    }'
    

    auto_date_histogram 和 date_histogram 差不多,但是是通过指定 bucket 让系统自动选择 interval 尽量达成 bucket 目标数

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "auto_date_histogram": {
             "field": "date",
             "buckets": 3
          }
        }
      }
    }'
    

    结果

      "aggregations" : {
        "my_buckets" : {
          "buckets" : [
            {
              "key_as_string" : "2021-01-01T00:00:00.000Z",
              "key" : 1609459200000,
              "doc_count" : 2
            },
            {
              "key_as_string" : "2021-04-01T00:00:00.000Z",
              "key" : 1617235200000,
              "doc_count" : 1
            }
          ],
          "interval" : "3M"
        }
      }
    

    系统自动选了 3M 做 interval

    bucket aggregation : term/filter/filters

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "terms": {
             "field": "age"
          }
        }
      }
    }'
    

    按某个 field 统计分组数, 相当于 select age, count(*) from table group by age

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "filter": { "term": {"title": "designer"}}
        }
      }
    }'
    

    按某个 field 的某个值统计, 相当于 select count(*) from table where title like %designer%

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "filters": {
             "filters": {
                "title": { "match": {"title": "designer"}},
                "age": { "match": {"age": 35}}
            }
          }
        }
      }
    }'
    

    分别统计两个 field 相当于做了两次 filter 查询

    bucket aggregation : range/date range

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "age_range": {
          "range": {
             "field": "age",
             "ranges": [
               { "to": 35 },
               { "from": 30, "to": 35 },
               { "from": 35 }
             ]
          }
        }
      }
    }'
    

    结果

      "aggregations" : {
        "age_range" : {
          "buckets" : [
            {
              "key" : "*-35.0",
              "to" : 35.0,
              "doc_count" : 2
            },
            {
              "key" : "30.0-35.0",
              "from" : 30.0,
              "to" : 35.0,
              "doc_count" : 1
            },
            {
              "key" : "35.0-*",
              "from" : 35.0,
              "doc_count" : 1
            }
          ]
        }
      }
    
    

    计算各个年龄段的数量

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_range": {
          "date_range": {
             "field": "date",
             "ranges": [
               { "to": "now-3M/M" },
               { "from": "now-3M/M" }
             ]
          }
        }
      }
    }'
    

    结果

      "aggregations" : {
        "my_range" : {
          "buckets" : [
            {
              "key" : "*-2021-05-01T00:00:00.000Z",
              "to" : 1.6198272E12,
              "to_as_string" : "2021-05-01T00:00:00.000Z",
              "doc_count" : 2
            },
            {
              "key" : "2021-05-01T00:00:00.000Z-*",
              "from" : 1.6198272E12,
              "from_as_string" : "2021-05-01T00:00:00.000Z",
              "doc_count" : 1
            }
          ]
        }
      }
    

    计算各个时间段的数量

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_range": {
          "date_range": {
             "field": "date",
             "ranges": [
               { "key": "Older", "from":"2021-05-01" },
               { "key": "Newer", "to":"2021-05-01" }
             ]
          }
        }
      }
    }'
    

    指定具体日期

    sub-aggregation : 比如实现即分组统计 (count) 又计算平均数 (avg)

    curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
      "size": 0,
      "aggs": {
        "my_buckets": {
          "filter": { "term": {"title": "designer"}},
          "aggs": {
             "avg_age": { "avg": { "field": "age" } }
          }
        }
      }
    }'
    

    结果

      "aggregations" : {
        "my_buckets" : {
          "doc_count" : 2,
          "avg_age" : {
            "value" : 32.5
          }
        }
      }
    

    可以看到即统计分组数,又对分组计算平均值

    Pipeline aggregations : avg_bucket

    POST _search
    {
      "size": 0,
      "aggs": {
        "sales_per_month": {
          "date_histogram": {
            "field": "date",
            "calendar_interval": "month"
          },
          "aggs": {
            "sales": {
              "sum": {
                "field": "price"
              }
            }
          }
        },
        "avg_monthly_sales": {
    // tag::avg-bucket-agg-syntax[]               
          "avg_bucket": {
            "buckets_path": "sales_per_month>sales",
            "gap_policy": "skip",
            "format": "#,##0.00;(#,##0.00)"
          }
    // end::avg-bucket-agg-syntax[]               
        }
      }
    }
    

    结果

      "aggregations": {
        "sales_per_month": {
          "buckets": [
            {
              "key_as_string": "2015/01/01 00:00:00",
              "key": 1420070400000,
              "doc_count": 3,
              "sales": {
                "value": 550.0
              }
            },
            {
              "key_as_string": "2015/02/01 00:00:00",
              "key": 1422748800000,
              "doc_count": 2,
              "sales": {
                "value": 60.0
              }
            },
            {
              "key_as_string": "2015/03/01 00:00:00",
              "key": 1425168000000,
              "doc_count": 2,
              "sales": {
                "value": 375.0
              }
            }
          ]
        },
        "avg_monthly_sales": {
          "value": 328.33333333333333,
          "value_as_string": "328.33"
        }
      }
    

    计算每个 bucket 的 avg,再计算 bucket avg 的 avg

    Pipeline aggregations : cumulative_sum

    POST /sales/_search
    {
      "size": 0,
      "aggs": {
        "sales_per_month": {
          "date_histogram": {
            "field": "date",
            "calendar_interval": "month"
          },
          "aggs": {
            "sales": {
              "sum": {
                "field": "price"
              }
            },
            "cumulative_sales": {
              "cumulative_sum": {
                "buckets_path": "sales" 
              }
            }
          }
        }
      }
    }
    

    结果

       "aggregations": {
          "sales_per_month": {
             "buckets": [
                {
                   "key_as_string": "2015/01/01 00:00:00",
                   "key": 1420070400000,
                   "doc_count": 3,
                   "sales": {
                      "value": 550.0
                   },
                   "cumulative_sales": {
                      "value": 550.0
                   }
                },
                {
                   "key_as_string": "2015/02/01 00:00:00",
                   "key": 1422748800000,
                   "doc_count": 2,
                   "sales": {
                      "value": 60.0
                   },
                   "cumulative_sales": {
                      "value": 610.0
                   }
                },
                {
                   "key_as_string": "2015/03/01 00:00:00",
                   "key": 1425168000000,
                   "doc_count": 2,
                   "sales": {
                      "value": 375.0
                   },
                   "cumulative_sales": {
                      "value": 985.0
                   }
                }
             ]
          }
       }
    

    计算每个 bucket 的 sum,再计算 bucket sum 在每个阶段的累加 sum

    Pipeline aggregations : max_bucket

    POST /sales/_search
    {
      "size": 0,
      "aggs": {
        "sales_per_month": {
          "date_histogram": {
            "field": "date",
            "calendar_interval": "month"
          },
          "aggs": {
            "sales": {
              "sum": {
                "field": "price"
              }
            }
          }
        },
        "max_monthly_sales": {
          "max_bucket": {
            "buckets_path": "sales_per_month>sales" 
          }
        }
      }
    }
    

    结果

       "aggregations": {
          "sales_per_month": {
             "buckets": [
                {
                   "key_as_string": "2015/01/01 00:00:00",
                   "key": 1420070400000,
                   "doc_count": 3,
                   "sales": {
                      "value": 550.0
                   }
                },
                {
                   "key_as_string": "2015/02/01 00:00:00",
                   "key": 1422748800000,
                   "doc_count": 2,
                   "sales": {
                      "value": 60.0
                   }
                },
                {
                   "key_as_string": "2015/03/01 00:00:00",
                   "key": 1425168000000,
                   "doc_count": 2,
                   "sales": {
                      "value": 375.0
                   }
                }
             ]
          },
          "max_monthly_sales": {
              "keys": ["2015/01/01 00:00:00"], 
              "value": 550.0
          }
       }
    

    计算每个 bucket 的 sum,再取 sum 最大的 bucket

    脚本执行

    支持脚本查询: 略





  • 相关阅读:
    commando VM安装
    Pocscan搭建详解
    Windows-RW-LinuxFS
    Festival
    ffmpeg-metadata
    FFmpeg-Screen-Recording
    ffmpeg-map
    ffmpeg-utils
    Linux-Fcitx5
    ffmpeg-volumedetect
  • 原文地址:https://www.cnblogs.com/moonlight-lin/p/15138710.html
Copyright © 2011-2022 走看看