zoukankan html css js c++ java

ElasticSearch 聚合函数

一、简单聚合

桶：简单来说就是满足特定条件的文档的集合。

指标：大多数指标是简单的数学运算（例如最小值、平均值、最大值，还有汇总），这些是通过文档的值来计算。

桶能让我们划分文档到有意义的集合，但是最终我们需要的是对这些桶内的文档进行一些指标的计算。分桶是一种达到目的的手段：它提供了一种给文档分组的方法来让我们可以计算感兴趣的指标。在实践中，指标能让你计算像平均薪资、最高出售价格、95%的查询延迟这样的数据。

例如，桶和指标可以类似映射成SQL查询语句

SELECT COUNT(color)   //相当于桶
FROM table
GROUP BY color       //相当于指标

桶在概念上类似于 SQL 的分组（GROUP BY），而指标则类似于 COUNT() 、 SUM() 、 MAX() 等统计方法。

 1 curl -XGET 'http://192.9.8.222:9200/wymlib/ym_literature/_search?pretty=true' -d '
 2 {
 3   "size": 0,
 4   "aggregations": {
 5     "popular_author": {
 6       "terms": {
 7         "field": "author"
 8       }
 9     }
10   }
11 }'

结果显示： 
1 {
 2   "took" : 2803,
 3   "timed_out" : false,
 4   "_shards" : {
 5     "total" : 5,
 6     "successful" : 5,
 7     "failed" : 0
 8   },
 9   "hits" : {
10     "total" : 25,
11     "max_score" : 0.0,
12     "hits" : [ ]
13   },
14   "aggregations" : {     //1
15     "popular_author" : { //2
16       "doc_count_error_upper_bound" : 0,
17       "sum_other_doc_count" : 0,
18       "buckets" : [ {    //3
19         "key" : "王阳明",
20         "doc_count" : 4
21       }, {
22         "key" : "阳明",
23         "doc_count" : 4
24       }, {
25         "key" : "胡",
26         "doc_count" : 2
27       }, {
28         "key" : "大大",
29         "doc_count" : 1
30       }, {
31         "key" : "建",
32         "doc_count" : 1
33       }, {
34         "key" : "徐",
35         "doc_count" : 1
36       }, {
37         "key" : "杰",
38         "doc_count" : 1
39       }, {
40         "key" : "闯",
41         "doc_count" : 1
42       } ]
43     }
44   }
45 }

//1 聚合操作被置于顶层参数 aggs 之下（如果你愿意，完整形式 aggregations 同样有效）。

//2 然后，可以为聚合指定一个我们想要名称，本例中是： popular_colors 。

//3 最后，定义单个桶的类型 terms(这里出现的是buckets) 。

注意：可能会注意到我们将 size 设置成 0 。我们并不关心搜索结果的具体内容，所以将返回记录数设置为 0 来提高查询速度。设置 size: 0 与 Elasticsearch 1.x 中使用 count 搜索类型等价。

二、基于metric的聚合

因为不是特别了解，我暂且将它理解为基于指标的聚合，后面如果发现不对，再来改正。

 1 curl -XGET 'http://192.9.8.222:9200/test_es_order_index/test_es_order_type/_search?pretty=true' -d '
 2 {
 3   "aggregations": {
 4     "sum_age": {
 5       "sum": {
 6         "field": "age"
 7       }
 8     }
 9   }
10 }'

结果：（age 一个是29 一个是21求和后是50）

1 "aggregations" : {
2     "sum_age" : {
3       "value" : 50.0
4     }
5   }

stats统计：

 1 curl -XGET 'http://192.9.8.222:9200/test_es_order_index/test_es_order_type/_search?pretty=true' -d '
 2 {
 3   "size":0,
 4   "aggregations": {
 5     "stats_age": {
 6       "stats": {
 7         "field": "age"
 8       }
 9     }
10   }
11 }'

结果（请求后会直接显示多种聚合结果）：

 1 {
 2   "took" : 2,
 3   "timed_out" : false,
 4   "_shards" : {
 5     "total" : 5,
 6     "successful" : 5,
 7     "failed" : 0
 8   },
 9   "hits" : {
10     "total" : 7,
11     "max_score" : 0.0,
12     "hits" : [ ]
13   },
14   "aggregations" : {
15     "stats_age" : {
16       "count" : 2,
17       "min" : 21.0,
18       "max" : 29.0,
19       "avg" : 25.0,
20       "sum" : 50.0
21     }
22   }
23 }

【Reference】

【1】 http://www.cnblogs.com/xing901022/p/4947436.html

【2】 https://www.elastic.co/guide/cn/elasticsearch/guide/current/_aggregation_test_drive.html

【3】 http://www.cnblogs.com/xing901022/p/4944043.html

查看全文

相关阅读:
gitlab
MySQL千万级别大表，你要如何优化？
kafka入门
 zookeeper的原理和应用
 MySQL 性能优化之慢查询
 Redis一些新的看法
 mysql 数据库锁
 MYSQL查看进程和kill进程
 hadoop批量命令脚本xcall.sh及jps找不到命令解决
 java stream 处理分组后取每组最大

原文地址：https://www.cnblogs.com/hoojjack/p/7709951.html