zoukankan html css js c++ java

ElasticSearch 查询命令

查询方式
插入数据
查看 index
term 查询
分词大小写
terms 查询
hits 结果
count 查询
string 查询
match 查询
match_phrase 查询
multi_match
match_all 查询
bool 查询
控制查询返回数
控制返回字段
排序
范围查询
通配符查询
模糊查询
分值
aggregation 查询
metrics aggregation : avg
metrics aggregation : avg & histogram
metrics aggregation : max/min/sum
metrics aggregation : boxplot
metrics aggregation : cardinality
metrics aggregation : extended_stats
metrics aggregation : geo
metrics aggregation : matrix_stats
bucket aggregation : adjacency_matrix
bucket aggregation : composite
bucket aggregation : composite - after_key
bucket aggregation : date_histogram/auto_date_histogram
bucket aggregation : term/filter/filters
bucket aggregation : range/date range
sub-aggregation : 比如实现即分组统计 (count) 又计算平均数 (avg)
Pipeline aggregations : avg_bucket
Pipeline aggregations : cumulative_sum
Pipeline aggregations : max_bucket
脚本执行

查询方式

ES 有自己的不同于 SQL 的查询语法，也提供了 JDBC 等包可以执行相应的 SQL

这里的例子用的是 ES 自己的查询语法

插入数据

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Wang",
  "title": "software designer",
  "age": 35,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some AI machine learning works"
}'

会自动创建 my_index, _doc, 以及各个 field

查看 index

curl localhost:9200/my_index?pretty


{
  "my_index" : {
    "aliases" : { },
    "mappings" : {
      "_doc" : {
        "properties" : {
          "address" : {
            "properties" : {
              "city" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "district" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "age" : {
            "type" : "long"
          },
          "content" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "title" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1628737053188",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "-NHgaqt4R_SQs2KHd0aJwQ",
        "version" : {
          "created" : "6050199"
        },
        "provided_name" : "my_index"
      }
    }
  }
}

会列出 setting 和 mapping

term 查询

要求完全匹配，即查询条件不分词（数据默认是按分词索引）

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content":"machine"
     }
   }
}'

能查到结果

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content":"machine learning"
     }
   }
}'

不能查到结果

因为查询条件 "machine learning" 必须完全匹配，但数据是按分词索引的，没有 "machine learning" 这个分词

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content.keyword":"machine"
     }
   }
}'

不能查到结果

keyword 代表不查分词数据，而是查原数据，原数据不和 "machine" 完全匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "content.keyword":"I want to do some AI machine learning works"
     }
   }
}'

能查到结果

查询条件和原数据完全匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "term":{
        "address.city":"guangzhou"
     }
   }
}'

能查到嵌套字段的结果

分词大小写

貌似要使用小写查询，可能因为 es 默认将分词都转换成小写

terms 查询

要求多个词中的任意一个能完全匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "terms":{
        "content": ["machine", "learning"]
     }
   }
}'

能查到结果，分词 machine 和 learning 都能匹配上

hits 结果

匹配的每个记录会在 hits 字段中打出来

  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.9932617,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "anbGPnsBSeHJMrQRVMQK",
        "_score" : 1.9932617,
        "_source" : {
          "name" : "Wang",
          "title" : "software designer",
          "age" : 35,
          "address" : {
            "city" : "guangzhou",
            "district" : "tianhe"
          },
          "content" : "I want to do some AI machine learning works",
          "kpi" : 3.2,
          "date" : "2021-01-01T08:00:00Z"
        }
      }
    ]
  }

具体数据在 hits -> hits -> _source

count 查询

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_count?pretty -d '{
  "query": {
     "term":{
        "title": "software"
     }
   }
}'

  "count" : 2,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }

只返回 count 结果

string 查询

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "query_string":{
        "query": "(machine learning) AND (K8S works)",
        "default_field": "content"
     }
   }
}'

(machine learning) 和 (K8S works) 会被拆成两组分词, 只有同时匹配两组分词中的任意一个的，才能匹配上

match 查询

分词匹配，即查询条件会被做分词处理，并且任一分词满足即可

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match":{
        "content": "machine learning"
     }
   }
}'

能查到结果，两个分词都匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match":{
        "content": "works machine AI"
     }
   }
}'

能查到结果，所有分词都匹配，并且和顺序无关

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match":{
        "content": "machine factory"
     }
   }
}'

能查到结果，有一个分词即 machine 满足即可

match_phrase 查询

查询条件会被当成一个完整的词汇对待，原数据包含这个词汇才匹配
（对比 term 则是原数据和查询词汇完全一样才匹配，match_phrase 是包含的关系）

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": "machine factory"
     }
   }
}'

不能查到结果，因为 machine factory 不匹配

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": "works machine AI"
     }
   }
}'

不能查到结果，虽然原数据包含这三个分词，但 match_phrase 是把 works machine AI 当成一个完整的单词对待

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": "AI machine learning"
     }
   }
}'

能查到结果，因为 AI machine learning 有作为完整连续的单词出现

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": {
           "query": "some AI works",
           "slop" : 2
        }
     }
   }
}'

能查到结果

虽然 some AI works 作为一个完整的单词没有出现，但 slop 2 表示如果最多跳过两个分词就能满足的话也算匹配上，这里跳过 machine learning

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_phrase":{
        "content": {
           "query": "some AI works",
           "slop" : 1
        }
     }
   }
}'

不能查到结果，只跳过一个分词依然无法匹配上

multi_match

对多个字段进行 match 查询，有一个字段满足的话就算匹配上

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "multi_match":{
        "query": "machine learning",
        "fields" : ["title", "content"]
     }
   }
}'

能查到结果，content 满足查询条件

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "multi_match":{
        "query": "designer",
        "fields" : ["title", "content"]
     }
   }
}'

能查到结果，title 满足查询条件

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "multi_match":{
        "query": "manager",
        "fields" : ["title", "content"]
     }
   }
}'

不能查到结果，title 和 content 都不满足查询条件

match_all 查询

返回所有文档

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "match_all":{}
   }
}'

不指定条件

bool 查询

联合查询，多个条件同时满足才匹配

每个条件可以是 must, filter, should, must_not

must: 必须满足 must 子句的条件，并且参与计算分值
filter: 必须满足 filter 子句的条件，不参与计算分值
should: 至少满足 should 子句的一个或多个条件(由 minimum_should_match 参数决定)，参与计算分值
must_not: 必须不满足 must_not 定义的条件

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "bool":{
        "must": [
            {
                "term": {"address.city": "guangzhou"}
            },
            {
                "match": {"content": "machine learning"}
            }
        ],
        "must_not": {
            "range": {"age": {"gt": 35}}
        },
        "filter": {
            "match": {"title": "designer"}
        },
        "should": [
            {
                "term": {"name": "Li"}
            },
            {
                "match_phrase": {"content": "AI machine"}
            }
        ],
        "minimum_should_match" : 1
     }
   }
}'

能查到结果，因为 bool 下的所有条件都能满足

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "bool":{
        "must": [
            {
                "term": {"address.city": "guangzhou"}
            },
            {
                "match": {"content": "machine learning"}
            }
        ],
        "must_not": {
            "range": {"age": {"from": 35, "to": 40}}
        },
        "filter": {
            "match": {"title": "designer"}
        },
        "should": [
            {
                "term": {"name": "Li"}
            },
            {
                "match_phrase": {"content": "AI machine"}
            }
        ],
        "minimum_should_match" : 1
     }
   }
}'

不能查到结果，因为 must_not 条件不满足

bool 可以嵌套

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "bool":{
        "must": [
            {
                "bool": {
                    "should": [
                        {
                            "term": {"name": "Li"}
                        },
                        {
                            "match_phrase": {"content": "AI machine"}
                        }
                    ]
                }
            },
            {
                "bool": {
                    "filter": {
                        "match": {"title": "designer"}
                    }
                }
            }
        ]
     }
   }
}'

must 里面是多个 bool 查询

控制查询返回数

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "from":0,
  "size":2,
  "query": {
     "term":{
        "title":"designer"
     }
   }
}'

从第一个开始，最多返回 2 个

控制返回字段

就像 SQL 的 select 选择特定字段一样

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "_source":["name","age"],
  "query": {
     "term":{
        "title":"designer"
     }
   }
}'

只返回匹配文档的 name 和 age 字段

排序

指定排序字段

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "_source":["name","age"],
  "query": {
     "term":{
        "title":"designer"
     }
  },
  "sort": [
     {
         "age": {"order": "desc"}
     }
  ]
}'

结果按 age 的降序排

范围查询

支持 from, to, gte, lte, gt, lt 等等，比如

{
    "query": {
        "range": {
            "date": {
                "gte": "2021-08-01",
                "lte": "2021-08-02",
                "relation": "within",
                "format": "yyyy-MM-dd"
            }
        }
    }
}

relation 也可以是 CONTAINS, INTERSECTS (默认)

因为 date 字段可以是一个范围，比如

"date": {"gte":"2021-08-01","lte":"2021-08-03"}

within 表示 date 的范围在 range 的范围内
contains 表示 date 的范围包含了 range 的范围
intersects 表示 date 的范围和 range 的范围有交叉

可以通过 format 指定日期格式

通配符查询

支持 * 和 ?

* 代表 0 个或多个字符
? 代表任意一个字符

模糊查询

查询类似的单词

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "query": {
     "fuzzy":{
        "title":"desinger"
     }
   }
}'

虽然 desinger 写错了，但还是能查到

分值

查询结果会有 _score 字段表示该文档和查询条件的相关度

决定分值的因素包括

词频: 在文档中出现的次数越多，权重越高
逆向文档频率: 单词在所有文档中出现的次数越多，权重越低，比如 and/the 等词汇
文档长度: 文档越长权重越高

aggregation 查询

插入更多数据

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Wang",
  "title": "software designer",
  "age": 35,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some AI machine learning works",
  "kpi": 3.2,
  "date": "2021-01-01T08:00:00Z"
}'

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Li",
  "title": "senior software designer",
  "age": 30,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some K8S works",
  "kpi": 4.0,
  "date": "2021-01-01T10:00:00Z"
}'

curl -X POST 'http://localhost:9200/my_index/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Zhang",
  "title": "Test Engineer",
  "age": 25,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "content": "I want to do some auto-test works",
  "kpi": 4.5,
  "date": "2021-06-01T09:00:00Z"
}'

Aggregation 查询包括以下几类

Bucket aggregations: 统计每个分组的记录的数量
Metrics aggregations: 统计每个分组的记录的平均值/最大值/等等
Pipeline aggregations: 对 agg 的结果再做进一步计算

下面举出部门 agg 操作的例子

所有的 agg 操作参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

metrics aggregation : avg

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_avg": {
        "avg": { 
           "field": "kpi",
           "missing": 3.5
        } 
     }
  }
}'

aggs 是关键字，写成 aggregations 也可以
kpi_avg 是自定义名字，会在结果中出现
avg 是关键字，表示要做 avg 操作，field 指定要做 avg 的字段，missing 表示如果字段不存在的话要使用的默认值

如果不指定 query 就是对所有数据做 agg

如果不指定 size 为 0，除了打出 agg 的结果，还会把匹配的数据都打出来
指定了 size 为 0 后，就只打出 agg 的结果

  "aggregations" : {
    "kpi_avg" : {
      "value" : 3.900000015894572
    }
  }

可以一次指定多个 aggs 查询

metrics aggregation : avg & histogram

如果数据是 histogram 类型 (需要创建 index 时指定)

curl -X PUT 'http://localhost:9200/my_index_histogram' -H 'Content-Type: application/json' -d '{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      }
    }
  }
}'

curl -X POST 'http://localhost:9200/my_index_histogram/_doc'  -H 'Content-Type: application/json' -d '{
  "name": "Zhao",
  "title": "manager",
  "age": 30,
  "address": {"city": "guangzhou", "district": "tianhe"},
  "my_histogram": {
      "values" : [3.5, 4.0, 4.5], 
      "counts" : [1, 2, 3] 
  }
}'

avg 处理是 (3.51 + 4.02 + 4.5*3) / (1+2+3)

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index_histogram/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "score_avg": {
        "avg": { 
           "field": "my_histogram"
        } 
     }
  }
}'

结果

  "aggregations" : {
    "score_avg" : {
      "value" : 4.166666666666667
    }
  }

默认自动创建的 index 字段不是 histogram 的

metrics aggregation : max/min/sum

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "kpi_avg": {
        "max": { 
           "field": "kpi"
        } 
     }
  }
}'

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "kpi_avg": {
        "min": { 
           "field": "kpi"
        } 
     }
  }
}'

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "kpi_avg": {
        "sum": { 
           "field": "kpi"
        } 
     }
  }
}'

计算 min/max/sum 等等

metrics aggregation : boxplot

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_avg": {
        "boxplot": { 
           "field": "kpi",
           "missing": 3.5
        }
     }
  }
}'

结果

  "aggregations" : {
    "kpi_avg" : {
      "min" : 3.200000047683716,
      "max" : 4.5,
      "q1" : 3.400000035762787,
      "q2" : 4.0,
      "q3" : 4.375,
      "lower" : 3.200000047683716,
      "upper" : 4.5
    }
  }

箱型图

q1 : 下四分位数 (25%)
q2 : 中位数 (50%)
q3 : 上四分位数 (75%)
lower : 不小于 q1-1.5(q3-q1) 的值中的最小值
upper : 不大于 q3+1.5(q3-q1) 的值中的最大值
min : 最小值
max : 最大值

metrics aggregation : cardinality

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_count": {
        "cardinality": { 
           "field": "kpi"
        }
     }
  }
}'

结果

  "aggregations" : {
    "kpi_count" : {
      "value" : 3
    }
  }

相当于 count(distinct field)

metrics aggregation : extended_stats

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "query": {
     "match":{
        "title": "designer Engineer"
     }
  },
  "aggs": {
     "kpi_stats": {
        "extended_stats": { 
           "field": "kpi"
        }
     }
  }
}'

结果

  "aggregations" : {
    "kpi_stats" : {
      "count" : 3,
      "min" : 3.200000047683716,
      "max" : 4.5,
      "avg" : 3.900000015894572,
      "sum" : 11.700000047683716,
      "sum_of_squares" : 46.49000030517578,
      "variance" : 0.28666664441426565,
      "variance_population" : 0.28666664441426565,
      "variance_sampling" : 0.4299999666213985,
      "std_deviation" : 0.5354125926930237,
      "std_deviation_population" : 0.5354125926930237,
      "std_deviation_sampling" : 0.6557438269792545,
      "std_deviation_bounds" : {
        "upper" : 4.970825201280619,
        "lower" : 2.8291748305085243,
        "upper_population" : 4.970825201280619,
        "lower_population" : 2.8291748305085243,
        "upper_sampling" : 5.2114876698530805,
        "lower_sampling" : 2.588512361936063
      }
    }
  }

各种统计结果

metrics aggregation : geo

创建 geo 类型的数据

curl -X PUT 'http://localhost:9200/my_index_geo' -H 'Content-Type: application/json' -d '{
  "mappings" : {
    "properties" : {
      "location" : {
        "type" : "geo_point"
      }
    }
  }
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "52.374081,4.912350"
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "52.369219,4.901618"
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "52.371667,4.914722"
}'

curl -X POST 'http://localhost:9200/my_index_geo/_doc'  -H 'Content-Type: application/json' -d '{
  "location": "51.222900,4.405200"
}'

可以获取中心点

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index_geo/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "centroid": {
        "geo_centroid": { 
           "field": "location"
        }
     }
  }
}'

结果

  "aggregations" : {
    "centroid" : {
      "location" : {
        "lat" : 52.08446673466824,
        "lon" : 4.783472470007837
      },
      "count" : 4
    }
  }

还可以获取边界等等

metrics aggregation : matrix_stats

计算均值/方差/均差/相关数，等等

bucket aggregation : adjacency_matrix

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "people_group": {
        "adjacency_matrix": { 
           "filters": {
              "grpA" : { "terms" : { "title" : ["designer", "engineer"] }},
              "grpB" : { "terms" : { "title" : ["senior", "software"] }},
              "grpC" : { "terms" : { "content" : ["test", "pv"] }}
           }
        }
     }
  }
}'

结果

  "aggregations" : {
    "people_group" : {
      "buckets" : [
        {
          "key" : "grpA",
          "doc_count" : 3
        },
        {
          "key" : "grpA&grpB",
          "doc_count" : 2
        },
        {
          "key" : "grpA&grpC",
          "doc_count" : 1
        },
        {
          "key" : "grpB",
          "doc_count" : 2
        },
        {
          "key" : "grpC",
          "doc_count" : 1
        }
      ]
    }
  }

按照 filters 条件计算每个分组的数量

bucket aggregation : composite

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "composite_group": {
        "composite": { 
           "sources": [
              {"title" : { "terms" : { "field" : "title.keyword"}}},
              {"age" : { "terms" : { "field" : "age"}}}
           ]
        }
     }
  }
}'

必须使用 keyword 这样不分词，否则聚合不了

结果

  "aggregations" : {
    "composite_group" : {
      "after_key" : {
        "title" : "software designer",
        "age" : 35
      },
      "buckets" : [
        {
          "key" : {
            "title" : "Test Engineer",
            "age" : 25
          },
          "doc_count" : 1
        },
        {
          "key" : {
            "title" : "senior software designer",
            "age" : 30
          },
          "doc_count" : 1
        },
        {
          "key" : {
            "title" : "software designer",
            "age" : 35
          },
          "doc_count" : 1
        }
      ]
    }
  }

可以看到，结果类似于

select 
  field_a, field_b, count(*)
group by
  field_a, field_b

除了 terms 还可以是 Histogram、Date histogram、GeoTile grid

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "composite": {
        "sources": [
          {
            "date": {
              "date_histogram": {
                "field": "date",
                "calendar_interval": "1d"
              }
            }
          }
        ]
      }
    }
  }
}'

比如这个 Date histogram 分组的时候, 把 date 字段精确到天然后按天分组

  "aggregations" : {
    "my_buckets" : {
      "after_key" : {
        "date" : 1622505600000
      },
      "buckets" : [
        {
          "key" : {
            "date" : 1609459200000
          },
          "doc_count" : 2
        },
        {
          "key" : {
            "date" : 1622505600000
          },
          "doc_count" : 1
        }
      ]
    }
  }

可以通过 format 字段指定日期格式

bucket aggregation : composite - after_key

用于分页查询

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "composite_group": {
        "composite": { 
           "size": 2,
           "sources": [
              {"title" : { "terms" : { "field" : "title.keyword"}}},
              {"age" : { "terms" : { "field" : "age"}}}
           ]
        }
     }
  }
}'

指定只返回两个聚合

返回如下

  "aggregations" : {
    "composite_group" : {
      "after_key" : {
        "title" : "senior software designer",
        "age" : 30
      },
      "buckets" : [
        {
          "key" : {
            "title" : "Test Engineer",
            "age" : 25
          },
          "doc_count" : 1
        },
        {
          "key" : {
            "title" : "senior software designer",
            "age" : 30
          },
          "doc_count" : 1
        }
      ]
    }
  }

只有两个记录，同时返回的结果里有 after_key 表示当次返回的最后一个记录

下次查询的时候把这个 after_key 的内容带上

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
     "composite_group": {
        "composite": { 
           "size": 2,
           "sources": [
              {"title" : { "terms" : { "field" : "title.keyword"}}},
              {"age" : { "terms" : { "field" : "age"}}}
           ],
           "after": {
              "title" : "senior software designer",
              "age" : 30
           }
        }
     }
  }
}'

这样就会继续返回接下来的两个记录

bucket aggregation : date_histogram/auto_date_histogram

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "date_histogram": {
         "field": "date",
         "calendar_interval": "1d"
      }
    }
  }
}'

按天统计，但这里会从最小值到最大值，这个例子是从 2021-01-01 到 2021-06-01 每天出一个 bucket 哪怕是 0

只能是 1d，要指定多天的必须用 fixed_interval

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "date_histogram": {
         "field": "date",
         "fixed_interval": "2d"
      }
    }
  }
}'

auto_date_histogram 和 date_histogram 差不多，但是是通过指定 bucket 让系统自动选择 interval 尽量达成 bucket 目标数

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "auto_date_histogram": {
         "field": "date",
         "buckets": 3
      }
    }
  }
}'

结果

  "aggregations" : {
    "my_buckets" : {
      "buckets" : [
        {
          "key_as_string" : "2021-01-01T00:00:00.000Z",
          "key" : 1609459200000,
          "doc_count" : 2
        },
        {
          "key_as_string" : "2021-04-01T00:00:00.000Z",
          "key" : 1617235200000,
          "doc_count" : 1
        }
      ],
      "interval" : "3M"
    }
  }

系统自动选了 3M 做 interval

bucket aggregation : term/filter/filters

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "terms": {
         "field": "age"
      }
    }
  }
}'

按某个 field 统计分组数, 相当于 select age, count(*) from table group by age

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "filter": { "term": {"title": "designer"}}
    }
  }
}'

按某个 field 的某个值统计, 相当于 select count(*) from table where title like %designer%

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "filters": {
         "filters": {
            "title": { "match": {"title": "designer"}},
            "age": { "match": {"age": 35}}
        }
      }
    }
  }
}'

分别统计两个 field 相当于做了两次 filter 查询

bucket aggregation : range/date range

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "age_range": {
      "range": {
         "field": "age",
         "ranges": [
           { "to": 35 },
           { "from": 30, "to": 35 },
           { "from": 35 }
         ]
      }
    }
  }
}'

结果

  "aggregations" : {
    "age_range" : {
      "buckets" : [
        {
          "key" : "*-35.0",
          "to" : 35.0,
          "doc_count" : 2
        },
        {
          "key" : "30.0-35.0",
          "from" : 30.0,
          "to" : 35.0,
          "doc_count" : 1
        },
        {
          "key" : "35.0-*",
          "from" : 35.0,
          "doc_count" : 1
        }
      ]
    }
  }

计算各个年龄段的数量

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_range": {
      "date_range": {
         "field": "date",
         "ranges": [
           { "to": "now-3M/M" },
           { "from": "now-3M/M" }
         ]
      }
    }
  }
}'

结果

  "aggregations" : {
    "my_range" : {
      "buckets" : [
        {
          "key" : "*-2021-05-01T00:00:00.000Z",
          "to" : 1.6198272E12,
          "to_as_string" : "2021-05-01T00:00:00.000Z",
          "doc_count" : 2
        },
        {
          "key" : "2021-05-01T00:00:00.000Z-*",
          "from" : 1.6198272E12,
          "from_as_string" : "2021-05-01T00:00:00.000Z",
          "doc_count" : 1
        }
      ]
    }
  }

计算各个时间段的数量

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_range": {
      "date_range": {
         "field": "date",
         "ranges": [
           { "key": "Older", "from":"2021-05-01" },
           { "key": "Newer", "to":"2021-05-01" }
         ]
      }
    }
  }
}'

指定具体日期

sub-aggregation : 比如实现即分组统计 (count) 又计算平均数 (avg)

curl -X GET -H "Content-Type: application/json"  localhost:9200/my_index/_search?pretty -d '{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "filter": { "term": {"title": "designer"}},
      "aggs": {
         "avg_age": { "avg": { "field": "age" } }
      }
    }
  }
}'

结果

  "aggregations" : {
    "my_buckets" : {
      "doc_count" : 2,
      "avg_age" : {
        "value" : 32.5
      }
    }
  }

可以看到即统计分组数，又对分组计算平均值

Pipeline aggregations : avg_bucket

POST _search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    },
    "avg_monthly_sales": {
// tag::avg-bucket-agg-syntax[]               
      "avg_bucket": {
        "buckets_path": "sales_per_month>sales",
        "gap_policy": "skip",
        "format": "#,##0.00;(#,##0.00)"
      }
// end::avg-bucket-agg-syntax[]               
    }
  }
}

结果

  "aggregations": {
    "sales_per_month": {
      "buckets": [
        {
          "key_as_string": "2015/01/01 00:00:00",
          "key": 1420070400000,
          "doc_count": 3,
          "sales": {
            "value": 550.0
          }
        },
        {
          "key_as_string": "2015/02/01 00:00:00",
          "key": 1422748800000,
          "doc_count": 2,
          "sales": {
            "value": 60.0
          }
        },
        {
          "key_as_string": "2015/03/01 00:00:00",
          "key": 1425168000000,
          "doc_count": 2,
          "sales": {
            "value": 375.0
          }
        }
      ]
    },
    "avg_monthly_sales": {
      "value": 328.33333333333333,
      "value_as_string": "328.33"
    }
  }

计算每个 bucket 的 avg，再计算 bucket avg 的 avg

Pipeline aggregations : cumulative_sum

POST /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        },
        "cumulative_sales": {
          "cumulative_sum": {
            "buckets_path": "sales" 
          }
        }
      }
    }
  }
}

结果

   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550.0
               },
               "cumulative_sales": {
                  "value": 550.0
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60.0
               },
               "cumulative_sales": {
                  "value": 610.0
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375.0
               },
               "cumulative_sales": {
                  "value": 985.0
               }
            }
         ]
      }
   }

计算每个 bucket 的 sum，再计算 bucket sum 在每个阶段的累加 sum

Pipeline aggregations : max_bucket

POST /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    },
    "max_monthly_sales": {
      "max_bucket": {
        "buckets_path": "sales_per_month>sales" 
      }
    }
  }
}

结果

   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550.0
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60.0
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375.0
               }
            }
         ]
      },
      "max_monthly_sales": {
          "keys": ["2015/01/01 00:00:00"], 
          "value": 550.0
      }
   }

计算每个 bucket 的 sum，再取 sum 最大的 bucket

脚本执行

支持脚本查询: 略

查看全文

相关阅读:
mysql 数据库 II（数据类型）
mysql 数据库 I
网络协议
 Python 类IV（类成员，异常处理等）
Python 类III（三大特性，约束，super）
Python 类II
类加载机制
 Java新篇章之集合
 Java 类类型之 String 类型
 java 多态

原文地址：https://www.cnblogs.com/moonlight-lin/p/15138710.html