- 查询方式
- 插入数据
- 查看 index
- term 查询
- 分词大小写
- terms 查询
- hits 结果
- count 查询
- string 查询
- match 查询
- match_phrase 查询
- multi_match
- match_all 查询
- bool 查询
- 控制查询返回数
- 控制返回字段
- 排序
- 范围查询
- 通配符查询
- 模糊查询
- 分值
- aggregation 查询
- metrics aggregation : avg
- metrics aggregation : avg & histogram
- metrics aggregation : max/min/sum
- metrics aggregation : boxplot
- metrics aggregation : cardinality
- metrics aggregation : extended_stats
- metrics aggregation : geo
- metrics aggregation : matrix_stats
- bucket aggregation : adjacency_matrix
- bucket aggregation : composite
- bucket aggregation : composite - after_key
- bucket aggregation : date_histogram/auto_date_histogram
- bucket aggregation : term/filter/filters
- bucket aggregation : range/date range
- sub-aggregation : 比如实现即分组统计 (count) 又计算平均数 (avg)
- Pipeline aggregations : avg_bucket
- Pipeline aggregations : cumulative_sum
- Pipeline aggregations : max_bucket
- 脚本执行
查询方式
ES 有自己的不同于 SQL 的查询语法,也提供了 JDBC 等包可以执行相应的 SQL
这里的例子用的是 ES 自己的查询语法
插入数据
curl -X POST 'http://localhost:9200/my_index/_doc' -H 'Content-Type: application/json' -d '{
"name": "Wang",
"title": "software designer",
"age": 35,
"address": {"city": "guangzhou", "district": "tianhe"},
"content": "I want to do some AI machine learning works"
}'
会自动创建 my_index, _doc, 以及各个 field
查看 index
curl localhost:9200/my_index?pretty
{
"my_index" : {
"aliases" : { },
"mappings" : {
"_doc" : {
"properties" : {
"address" : {
"properties" : {
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"district" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"age" : {
"type" : "long"
},
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1628737053188",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "-NHgaqt4R_SQs2KHd0aJwQ",
"version" : {
"created" : "6050199"
},
"provided_name" : "my_index"
}
}
}
}
会列出 setting 和 mapping
term 查询
要求完全匹配,即查询条件不分词 (数据默认是按分词索引)
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"term":{
"content":"machine"
}
}
}'
能查到结果
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"term":{
"content":"machine learning"
}
}
}'
不能查到结果
因为查询条件 "machine learning" 必须完全匹配,但数据是按分词索引的,没有 "machine learning" 这个分词
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"term":{
"content.keyword":"machine"
}
}
}'
不能查到结果
keyword 代表不查分词数据,而是查原数据,原数据不和 "machine" 完全匹配
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"term":{
"content.keyword":"I want to do some AI machine learning works"
}
}
}'
能查到结果
查询条件和原数据完全匹配
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"term":{
"address.city":"guangzhou"
}
}
}'
能查到嵌套字段的结果
分词大小写
貌似要使用小写查询,可能因为 es 默认将分词都转换成小写
terms 查询
要求多个词中的任意一个能完全匹配
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"terms":{
"content": ["machine", "learning"]
}
}
}'
能查到结果,分词 machine 和 learning 都能匹配上
hits 结果
匹配的每个记录会在 hits 字段中打出来
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9932617,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "anbGPnsBSeHJMrQRVMQK",
"_score" : 1.9932617,
"_source" : {
"name" : "Wang",
"title" : "software designer",
"age" : 35,
"address" : {
"city" : "guangzhou",
"district" : "tianhe"
},
"content" : "I want to do some AI machine learning works",
"kpi" : 3.2,
"date" : "2021-01-01T08:00:00Z"
}
}
]
}
具体数据在 hits -> hits -> _source
count 查询
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_count?pretty -d '{
"query": {
"term":{
"title": "software"
}
}
}'
返回
"count" : 2,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
只返回 count 结果
string 查询
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"query_string":{
"query": "(machine learning) AND (K8S works)",
"default_field": "content"
}
}
}'
(machine learning) 和 (K8S works) 会被拆成两组分词, 只有同时匹配两组分词中的任意一个的,才能匹配上
match 查询
分词匹配,即查询条件会被做分词处理,并且任一分词满足即可
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match":{
"content": "machine learning"
}
}
}'
能查到结果,两个分词都匹配
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match":{
"content": "works machine AI"
}
}
}'
能查到结果,所有分词都匹配,并且和顺序无关
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match":{
"content": "machine factory"
}
}
}'
能查到结果,有一个分词即 machine 满足即可
match_phrase 查询
查询条件会被当成一个完整的词汇对待,原数据包含这个词汇才匹配
(对比 term 则是原数据和查询词汇完全一样才匹配,match_phrase 是包含的关系)
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match_phrase":{
"content": "machine factory"
}
}
}'
不能查到结果,因为 machine factory 不匹配
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match_phrase":{
"content": "works machine AI"
}
}
}'
不能查到结果,虽然原数据包含这三个分词,但 match_phrase 是把 works machine AI 当成一个完整的单词对待
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match_phrase":{
"content": "AI machine learning"
}
}
}'
能查到结果,因为 AI machine learning 有作为完整连续的单词出现
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match_phrase":{
"content": {
"query": "some AI works",
"slop" : 2
}
}
}
}'
能查到结果
虽然 some AI works 作为一个完整的单词没有出现,但 slop 2 表示如果最多跳过两个分词就能满足的话也算匹配上,这里跳过 machine learning
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match_phrase":{
"content": {
"query": "some AI works",
"slop" : 1
}
}
}
}'
不能查到结果,只跳过一个分词依然无法匹配上
multi_match
对多个字段进行 match 查询,有一个字段满足的话就算匹配上
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"multi_match":{
"query": "machine learning",
"fields" : ["title", "content"]
}
}
}'
能查到结果,content 满足查询条件
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"multi_match":{
"query": "designer",
"fields" : ["title", "content"]
}
}
}'
能查到结果,title 满足查询条件
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"multi_match":{
"query": "manager",
"fields" : ["title", "content"]
}
}
}'
不能查到结果,title 和 content 都不满足查询条件
match_all 查询
返回所有文档
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"match_all":{}
}
}'
不指定条件
bool 查询
联合查询,多个条件同时满足才匹配
每个条件可以是 must, filter, should, must_not
must: 必须满足 must 子句的条件,并且参与计算分值
filter: 必须满足 filter 子句的条件,不参与计算分值
should: 至少满足 should 子句的一个或多个条件(由 minimum_should_match 参数决定),参与计算分值
must_not: 必须不满足 must_not 定义的条件
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"bool":{
"must": [
{
"term": {"address.city": "guangzhou"}
},
{
"match": {"content": "machine learning"}
}
],
"must_not": {
"range": {"age": {"gt": 35}}
},
"filter": {
"match": {"title": "designer"}
},
"should": [
{
"term": {"name": "Li"}
},
{
"match_phrase": {"content": "AI machine"}
}
],
"minimum_should_match" : 1
}
}
}'
能查到结果,因为 bool 下的所有条件都能满足
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"bool":{
"must": [
{
"term": {"address.city": "guangzhou"}
},
{
"match": {"content": "machine learning"}
}
],
"must_not": {
"range": {"age": {"from": 35, "to": 40}}
},
"filter": {
"match": {"title": "designer"}
},
"should": [
{
"term": {"name": "Li"}
},
{
"match_phrase": {"content": "AI machine"}
}
],
"minimum_should_match" : 1
}
}
}'
不能查到结果,因为 must_not 条件不满足
bool 可以嵌套
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"bool":{
"must": [
{
"bool": {
"should": [
{
"term": {"name": "Li"}
},
{
"match_phrase": {"content": "AI machine"}
}
]
}
},
{
"bool": {
"filter": {
"match": {"title": "designer"}
}
}
}
]
}
}
}'
must 里面是多个 bool 查询
控制查询返回数
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"from":0,
"size":2,
"query": {
"term":{
"title":"designer"
}
}
}'
从第一个开始,最多返回 2 个
控制返回字段
就像 SQL 的 select 选择特定字段一样
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"_source":["name","age"],
"query": {
"term":{
"title":"designer"
}
}
}'
只返回匹配文档的 name 和 age 字段
排序
指定排序字段
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"_source":["name","age"],
"query": {
"term":{
"title":"designer"
}
},
"sort": [
{
"age": {"order": "desc"}
}
]
}'
结果按 age 的降序排
范围查询
支持 from, to, gte, lte, gt, lt 等等,比如
{
"query": {
"range": {
"date": {
"gte": "2021-08-01",
"lte": "2021-08-02",
"relation": "within",
"format": "yyyy-MM-dd"
}
}
}
}
relation 也可以是 CONTAINS, INTERSECTS (默认)
因为 date 字段可以是一个范围,比如
"date": {"gte":"2021-08-01","lte":"2021-08-03"}
within 表示 date 的范围在 range 的范围内
contains 表示 date 的范围包含了 range 的范围
intersects 表示 date 的范围和 range 的范围有交叉
可以通过 format 指定日期格式
通配符查询
支持 * 和 ?
* 代表 0 个或多个字符
? 代表任意一个字符
模糊查询
查询类似的单词
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"query": {
"fuzzy":{
"title":"desinger"
}
}
}'
虽然 desinger 写错了,但还是能查到
分值
查询结果会有 _score 字段表示该文档和查询条件的相关度
决定分值的因素包括
词频: 在文档中出现的次数越多,权重越高
逆向文档频率: 单词在所有文档中出现的次数越多,权重越低,比如 and/the 等词汇
文档长度: 文档越长权重越高
aggregation 查询
插入更多数据
curl -X POST 'http://localhost:9200/my_index/_doc' -H 'Content-Type: application/json' -d '{
"name": "Wang",
"title": "software designer",
"age": 35,
"address": {"city": "guangzhou", "district": "tianhe"},
"content": "I want to do some AI machine learning works",
"kpi": 3.2,
"date": "2021-01-01T08:00:00Z"
}'
curl -X POST 'http://localhost:9200/my_index/_doc' -H 'Content-Type: application/json' -d '{
"name": "Li",
"title": "senior software designer",
"age": 30,
"address": {"city": "guangzhou", "district": "tianhe"},
"content": "I want to do some K8S works",
"kpi": 4.0,
"date": "2021-01-01T10:00:00Z"
}'
curl -X POST 'http://localhost:9200/my_index/_doc' -H 'Content-Type: application/json' -d '{
"name": "Zhang",
"title": "Test Engineer",
"age": 25,
"address": {"city": "guangzhou", "district": "tianhe"},
"content": "I want to do some auto-test works",
"kpi": 4.5,
"date": "2021-06-01T09:00:00Z"
}'
Aggregation 查询包括以下几类
Bucket aggregations: 统计每个分组的记录的数量
Metrics aggregations: 统计每个分组的记录的平均值/最大值/等等
Pipeline aggregations: 对 agg 的结果再做进一步计算
下面举出部门 agg 操作的例子
所有的 agg 操作参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
metrics aggregation : avg
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"query": {
"match":{
"title": "designer Engineer"
}
},
"aggs": {
"kpi_avg": {
"avg": {
"field": "kpi",
"missing": 3.5
}
}
}
}'
aggs 是关键字,写成 aggregations 也可以
kpi_avg 是自定义名字,会在结果中出现
avg 是关键字,表示要做 avg 操作,field 指定要做 avg 的字段,missing 表示如果字段不存在的话要使用的默认值
如果不指定 query 就是对所有数据做 agg
如果不指定 size 为 0,除了打出 agg 的结果,还会把匹配的数据都打出来
指定了 size 为 0 后,就只打出 agg 的结果
"aggregations" : {
"kpi_avg" : {
"value" : 3.900000015894572
}
}
可以一次指定多个 aggs 查询
metrics aggregation : avg & histogram
如果数据是 histogram 类型 (需要创建 index 时指定)
curl -X PUT 'http://localhost:9200/my_index_histogram' -H 'Content-Type: application/json' -d '{
"mappings" : {
"properties" : {
"my_histogram" : {
"type" : "histogram"
}
}
}
}'
curl -X POST 'http://localhost:9200/my_index_histogram/_doc' -H 'Content-Type: application/json' -d '{
"name": "Zhao",
"title": "manager",
"age": 30,
"address": {"city": "guangzhou", "district": "tianhe"},
"my_histogram": {
"values" : [3.5, 4.0, 4.5],
"counts" : [1, 2, 3]
}
}'
avg 处理是 (3.51 + 4.02 + 4.5*3) / (1+2+3)
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index_histogram/_search?pretty -d '{
"size": 0,
"aggs": {
"score_avg": {
"avg": {
"field": "my_histogram"
}
}
}
}'
结果
"aggregations" : {
"score_avg" : {
"value" : 4.166666666666667
}
}
默认自动创建的 index 字段不是 histogram 的
metrics aggregation : max/min/sum
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"kpi_avg": {
"max": {
"field": "kpi"
}
}
}
}'
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"kpi_avg": {
"min": {
"field": "kpi"
}
}
}
}'
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"kpi_avg": {
"sum": {
"field": "kpi"
}
}
}
}'
计算 min/max/sum 等等
metrics aggregation : boxplot
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"query": {
"match":{
"title": "designer Engineer"
}
},
"aggs": {
"kpi_avg": {
"boxplot": {
"field": "kpi",
"missing": 3.5
}
}
}
}'
结果
"aggregations" : {
"kpi_avg" : {
"min" : 3.200000047683716,
"max" : 4.5,
"q1" : 3.400000035762787,
"q2" : 4.0,
"q3" : 4.375,
"lower" : 3.200000047683716,
"upper" : 4.5
}
}
箱型图
q1 : 下四分位数 (25%)
q2 : 中位数 (50%)
q3 : 上四分位数 (75%)
lower : 不小于 q1-1.5(q3-q1) 的值中的最小值
upper : 不大于 q3+1.5(q3-q1) 的值中的最大值
min : 最小值
max : 最大值
metrics aggregation : cardinality
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"query": {
"match":{
"title": "designer Engineer"
}
},
"aggs": {
"kpi_count": {
"cardinality": {
"field": "kpi"
}
}
}
}'
结果
"aggregations" : {
"kpi_count" : {
"value" : 3
}
}
相当于 count(distinct field)
metrics aggregation : extended_stats
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"query": {
"match":{
"title": "designer Engineer"
}
},
"aggs": {
"kpi_stats": {
"extended_stats": {
"field": "kpi"
}
}
}
}'
结果
"aggregations" : {
"kpi_stats" : {
"count" : 3,
"min" : 3.200000047683716,
"max" : 4.5,
"avg" : 3.900000015894572,
"sum" : 11.700000047683716,
"sum_of_squares" : 46.49000030517578,
"variance" : 0.28666664441426565,
"variance_population" : 0.28666664441426565,
"variance_sampling" : 0.4299999666213985,
"std_deviation" : 0.5354125926930237,
"std_deviation_population" : 0.5354125926930237,
"std_deviation_sampling" : 0.6557438269792545,
"std_deviation_bounds" : {
"upper" : 4.970825201280619,
"lower" : 2.8291748305085243,
"upper_population" : 4.970825201280619,
"lower_population" : 2.8291748305085243,
"upper_sampling" : 5.2114876698530805,
"lower_sampling" : 2.588512361936063
}
}
}
各种统计结果
metrics aggregation : geo
创建 geo 类型的数据
curl -X PUT 'http://localhost:9200/my_index_geo' -H 'Content-Type: application/json' -d '{
"mappings" : {
"properties" : {
"location" : {
"type" : "geo_point"
}
}
}
}'
curl -X POST 'http://localhost:9200/my_index_geo/_doc' -H 'Content-Type: application/json' -d '{
"location": "52.374081,4.912350"
}'
curl -X POST 'http://localhost:9200/my_index_geo/_doc' -H 'Content-Type: application/json' -d '{
"location": "52.369219,4.901618"
}'
curl -X POST 'http://localhost:9200/my_index_geo/_doc' -H 'Content-Type: application/json' -d '{
"location": "52.371667,4.914722"
}'
curl -X POST 'http://localhost:9200/my_index_geo/_doc' -H 'Content-Type: application/json' -d '{
"location": "51.222900,4.405200"
}'
可以获取中心点
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index_geo/_search?pretty -d '{
"size": 0,
"aggs": {
"centroid": {
"geo_centroid": {
"field": "location"
}
}
}
}'
结果
"aggregations" : {
"centroid" : {
"location" : {
"lat" : 52.08446673466824,
"lon" : 4.783472470007837
},
"count" : 4
}
}
还可以获取边界等等
metrics aggregation : matrix_stats
计算均值/方差/均差/相关数,等等
bucket aggregation : adjacency_matrix
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"people_group": {
"adjacency_matrix": {
"filters": {
"grpA" : { "terms" : { "title" : ["designer", "engineer"] }},
"grpB" : { "terms" : { "title" : ["senior", "software"] }},
"grpC" : { "terms" : { "content" : ["test", "pv"] }}
}
}
}
}
}'
结果
"aggregations" : {
"people_group" : {
"buckets" : [
{
"key" : "grpA",
"doc_count" : 3
},
{
"key" : "grpA&grpB",
"doc_count" : 2
},
{
"key" : "grpA&grpC",
"doc_count" : 1
},
{
"key" : "grpB",
"doc_count" : 2
},
{
"key" : "grpC",
"doc_count" : 1
}
]
}
}
按照 filters 条件计算每个分组的数量
bucket aggregation : composite
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"composite_group": {
"composite": {
"sources": [
{"title" : { "terms" : { "field" : "title.keyword"}}},
{"age" : { "terms" : { "field" : "age"}}}
]
}
}
}
}'
必须使用 keyword 这样不分词,否则聚合不了
结果
"aggregations" : {
"composite_group" : {
"after_key" : {
"title" : "software designer",
"age" : 35
},
"buckets" : [
{
"key" : {
"title" : "Test Engineer",
"age" : 25
},
"doc_count" : 1
},
{
"key" : {
"title" : "senior software designer",
"age" : 30
},
"doc_count" : 1
},
{
"key" : {
"title" : "software designer",
"age" : 35
},
"doc_count" : 1
}
]
}
}
可以看到,结果类似于
select
field_a, field_b, count(*)
group by
field_a, field_b
除了 terms 还可以是 Histogram、Date histogram、GeoTile grid
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{
"date": {
"date_histogram": {
"field": "date",
"calendar_interval": "1d"
}
}
}
]
}
}
}
}'
比如这个 Date histogram 分组的时候, 把 date 字段精确到天然后按天分组
"aggregations" : {
"my_buckets" : {
"after_key" : {
"date" : 1622505600000
},
"buckets" : [
{
"key" : {
"date" : 1609459200000
},
"doc_count" : 2
},
{
"key" : {
"date" : 1622505600000
},
"doc_count" : 1
}
]
}
}
可以通过 format 字段指定日期格式
bucket aggregation : composite - after_key
用于分页查询
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"composite_group": {
"composite": {
"size": 2,
"sources": [
{"title" : { "terms" : { "field" : "title.keyword"}}},
{"age" : { "terms" : { "field" : "age"}}}
]
}
}
}
}'
指定只返回两个聚合
返回如下
"aggregations" : {
"composite_group" : {
"after_key" : {
"title" : "senior software designer",
"age" : 30
},
"buckets" : [
{
"key" : {
"title" : "Test Engineer",
"age" : 25
},
"doc_count" : 1
},
{
"key" : {
"title" : "senior software designer",
"age" : 30
},
"doc_count" : 1
}
]
}
}
只有两个记录,同时返回的结果里有 after_key 表示当次返回的最后一个记录
下次查询的时候把这个 after_key 的内容带上
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"composite_group": {
"composite": {
"size": 2,
"sources": [
{"title" : { "terms" : { "field" : "title.keyword"}}},
{"age" : { "terms" : { "field" : "age"}}}
],
"after": {
"title" : "senior software designer",
"age" : 30
}
}
}
}
}'
这样就会继续返回接下来的两个记录
bucket aggregation : date_histogram/auto_date_histogram
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"date_histogram": {
"field": "date",
"calendar_interval": "1d"
}
}
}
}'
按天统计,但这里会从最小值到最大值,这个例子是从 2021-01-01 到 2021-06-01 每天出一个 bucket 哪怕是 0
只能是 1d,要指定多天的必须用 fixed_interval
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"date_histogram": {
"field": "date",
"fixed_interval": "2d"
}
}
}
}'
auto_date_histogram 和 date_histogram 差不多,但是是通过指定 bucket 让系统自动选择 interval 尽量达成 bucket 目标数
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"auto_date_histogram": {
"field": "date",
"buckets": 3
}
}
}
}'
结果
"aggregations" : {
"my_buckets" : {
"buckets" : [
{
"key_as_string" : "2021-01-01T00:00:00.000Z",
"key" : 1609459200000,
"doc_count" : 2
},
{
"key_as_string" : "2021-04-01T00:00:00.000Z",
"key" : 1617235200000,
"doc_count" : 1
}
],
"interval" : "3M"
}
}
系统自动选了 3M 做 interval
bucket aggregation : term/filter/filters
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"terms": {
"field": "age"
}
}
}
}'
按某个 field 统计分组数, 相当于 select age, count(*) from table group by age
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"filter": { "term": {"title": "designer"}}
}
}
}'
按某个 field 的某个值统计, 相当于 select count(*) from table where title like %designer%
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"filters": {
"filters": {
"title": { "match": {"title": "designer"}},
"age": { "match": {"age": 35}}
}
}
}
}
}'
分别统计两个 field 相当于做了两次 filter 查询
bucket aggregation : range/date range
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"age_range": {
"range": {
"field": "age",
"ranges": [
{ "to": 35 },
{ "from": 30, "to": 35 },
{ "from": 35 }
]
}
}
}
}'
结果
"aggregations" : {
"age_range" : {
"buckets" : [
{
"key" : "*-35.0",
"to" : 35.0,
"doc_count" : 2
},
{
"key" : "30.0-35.0",
"from" : 30.0,
"to" : 35.0,
"doc_count" : 1
},
{
"key" : "35.0-*",
"from" : 35.0,
"doc_count" : 1
}
]
}
}
计算各个年龄段的数量
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_range": {
"date_range": {
"field": "date",
"ranges": [
{ "to": "now-3M/M" },
{ "from": "now-3M/M" }
]
}
}
}
}'
结果
"aggregations" : {
"my_range" : {
"buckets" : [
{
"key" : "*-2021-05-01T00:00:00.000Z",
"to" : 1.6198272E12,
"to_as_string" : "2021-05-01T00:00:00.000Z",
"doc_count" : 2
},
{
"key" : "2021-05-01T00:00:00.000Z-*",
"from" : 1.6198272E12,
"from_as_string" : "2021-05-01T00:00:00.000Z",
"doc_count" : 1
}
]
}
}
计算各个时间段的数量
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_range": {
"date_range": {
"field": "date",
"ranges": [
{ "key": "Older", "from":"2021-05-01" },
{ "key": "Newer", "to":"2021-05-01" }
]
}
}
}
}'
指定具体日期
sub-aggregation : 比如实现即分组统计 (count) 又计算平均数 (avg)
curl -X GET -H "Content-Type: application/json" localhost:9200/my_index/_search?pretty -d '{
"size": 0,
"aggs": {
"my_buckets": {
"filter": { "term": {"title": "designer"}},
"aggs": {
"avg_age": { "avg": { "field": "age" } }
}
}
}
}'
结果
"aggregations" : {
"my_buckets" : {
"doc_count" : 2,
"avg_age" : {
"value" : 32.5
}
}
}
可以看到即统计分组数,又对分组计算平均值
Pipeline aggregations : avg_bucket
POST _search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"avg_monthly_sales": {
// tag::avg-bucket-agg-syntax[]
"avg_bucket": {
"buckets_path": "sales_per_month>sales",
"gap_policy": "skip",
"format": "#,##0.00;(#,##0.00)"
}
// end::avg-bucket-agg-syntax[]
}
}
}
结果
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"avg_monthly_sales": {
"value": 328.33333333333333,
"value_as_string": "328.33"
}
}
计算每个 bucket 的 avg,再计算 bucket avg 的 avg
Pipeline aggregations : cumulative_sum
POST /sales/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"cumulative_sales": {
"cumulative_sum": {
"buckets_path": "sales"
}
}
}
}
}
}
结果
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
},
"cumulative_sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
},
"cumulative_sales": {
"value": 610.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
},
"cumulative_sales": {
"value": 985.0
}
}
]
}
}
计算每个 bucket 的 sum,再计算 bucket sum 在每个阶段的累加 sum
Pipeline aggregations : max_bucket
POST /sales/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"max_monthly_sales": {
"max_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
结果
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"max_monthly_sales": {
"keys": ["2015/01/01 00:00:00"],
"value": 550.0
}
}
计算每个 bucket 的 sum,再取 sum 最大的 bucket
脚本执行
支持脚本查询: 略