zoukankan      html  css  js  c++  java
  • day3: elasticsearch的聚合查询

    感谢博主的贡献: https://juejin.im/post/6844904032398475278#heading-1

    聚合基础:
    https://juejin.im/post/6844904032398475278#heading-1

    聚合深入理解:
    Elasticsearch:aggregation介绍
    Elasticsearch:pipeline aggregation 介绍
    Elasticsearch:透彻理解Elasticsearch中的Bucket aggregation


    查找不同的年龄段:

    GET twitter/_search

    {
    	"size": 0,
    	"age": {
    		"range": {
    			"field": "age",
    			"ranges": [{
    					"from": 20,
    					"to": 30
    				},
    				{
    					"from": 30,
    					"to": 40
    				},
    				{
    					"from": 40,
    					"to": 50
    				}
    			]
    		}
    	}
    }
    

      

    使用range类型的聚合

    在上面我们定义了不同的年龄段。通过上面的查询,我们可以得到不同年龄段的bucket。显示的结果如下,符合条件的文档在 hits.hits列表中以一个个的字典存在:

    {
    	"took": 4,
    	"timed_out": false,
    	"_shards": {
    		"total": 1,
    		"successful": 1,
    		"skipped": 0,
    		"failed": 0
    	},
    	"hits": {
    		"total": {
    			"value": 5,
    			"relation": "eq"
    		},
    		"max_score": null,
    		"hits": []
    	},
    	"aggregations": {
    		"age": {
    			"buckets": [{
    					"key": "20.0-30.0",
    					"from": 20.0,
    					"to": 30.0,
    					"doc_count": 0
    				},
    				{
    					"key": "30.0-40.0",
    					"from": 30.0,
    					"to": 40.0,
    					"doc_count": 3
    				},
    				{
    					"key": "40.0-50.0",
    					"from": 40.0,
    					"to": 50.0,
    					"doc_count": 0
    				}
    			]
    		}
    	}
    }
    

      

    统计关键字出现的频率:
    内置关键字 aggs,terms, field, keyword
    curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"number_of_cities":{"terms":{"field":"city.keyword"}}}, "size":0}'

    {
    	"aggs": {
    		"number_of_cities": {
    			"terms": {
    				"field": "city.keyword"
    			}
    		}
    	},
    	"size": 0
    }

    得到

    {
    	"took": 3,
    	"timed_out": false,
    	"_shards": {
    		"total": 5,
    		"successful": 5,
    		"skipped": 0,
    		"failed": 0
    	},
    	"hits": {
    		"total": 71150,
    		"max_score": 0.0,
    		"hits": []
    	},
    	"aggregations": {
    		"number_of_cities": {
    			"doc_count_error_upper_bound": 116,
    			"sum_other_doc_count": 16983,
    			"buckets": [{
    					"key": "合肥",
    					"doc_count": 30017
    				},
    				{
    					"key": "",
    					"doc_count": 16761
    				},
    				{
    					"key": "columbia",
    					"doc_count": 1546
    				}
    			]
    		}
    	}
    }

    统计城市出现的个数:
    到底有多少个城市,内置关键字 cardinality
    XGET _search { "size": 0, "aggs": { "number_of_cities": { "cardinality": { "field": "city.keyword" } } } }

    {
    	"size": 0,
    	"aggs": {
    		"number_of_cities": {
    			"cardinality": {
    				"field": "city.keyword"
    			}
    		}
    	}
    }

    统计用户平均年龄:
    内置函数 avg
    GET twitter/_search { "size": 0, "aggs": { "average_age": { "avg": { "field": "age" } } } }

    统计平均分 avg,最大分 max,最小分 min,总和 sum
    curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"average_score":{"avg":{"field":"os_score"}}}, "size":0}'

    {
    	"aggs": {
    		"average_score": {
    			"avg": {
    				"field": "os_score"
    			}
    		}
    	},
    	"size": 0
    }

    通过script的方法来对我们的aggregtion结果进行重新计算:
    最大分的基础上乘以 0.8 用 *, 除以 2 用 / , 加上一个数 用 +, 减去一个数用 - ,
    curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"average_score":{"max":{"field":"os_score", "script":{"source":"_value * params.correction", "params":{"correction": 0.8}}}}}, "size":0}'

    {
    	"size": 0,
    	"aggs": {
    		"average_score": {
    			"max": {
    				"field": "os_score",
    				"script": {
    					"source": "_value * params.correction",
    					"params": {
    						"correction": 0.8
    					}
    				}
    			}
    		}
    	}
    }
    

    不用 field, 直接使用 script 聚合:
    与上述效果等价,尝试未成功
    GET twitter/_search

    {
    	"size": 0,
    	"aggs": {
    		"average_2_times_os_score": {
    			"avg": {
    				"script": {
    					"source": "doc['os_score'].value * params.times",
    					"params": {
    						"times": 2.0
    					}
    				}
    			}
    		}
    	}
    }

    Percentile aggregation
    百分位数聚合,如下语句可查出 os_score 的离群值,得到了 25, 50, 75, 100 的分数占比

    {
    	"size": 0,
    	"aggs": {
    		"os_score_quartiles": {
    			"percentiles": {
    				"field": "os_score",
    				"percents": [
    					25,
    					50,
    					75,
    					100
    				]
    			}
    		}
    	}
    }
    

      

    查找结果如下,可以看到
    25% 的分数为 90 分以下
    50% 的分数在 92 分以下
    75% 的分数在 100 分以下
    最高分为 100 分

    {
    	"took": 8,
    	"timed_out": false,
    	"_shards": {
    		"total": 5,
    		"successful": 5,
    		"skipped": 0,
    		"failed": 0
    	},
    	"hits": {
    		"total": 71150,
    		"max_score": 0.0,
    		"hits": []
    	},
    	"aggregations": {
    		"os_score_qualities": {
    			"values": {
    				"25.0": 90.0,
    				"50.0": 92.0,
    				"75.0": 100.0,
    				"100.0": 100.0
    			}
    		}
    	}
    }

    analyzer

    实现秒级的搜索速度的原因之一:文档被存储时加了索引

    curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_analyze?pretty' -d '{"text":["我是一个兵"], "analyzer":"standard"}'

    {
    	"text": ["我是一个兵"],
    	"analyzer": "standard"
    }
    

      

    结果如下,五个token

    {
    	"tokens": [{
    			"token": "我",
    			"start_offset": 0,
    			"end_offset": 1,
    			"type": "<IDEOGRAPHIC>",
    			"position": 0
    		},
    		{
    			"token": "是",
    			"start_offset": 1,
    			"end_offset": 2,
    			"type": "<IDEOGRAPHIC>",
    			"position": 1
    		},
    		{
    			"token": "一",
    			"start_offset": 2,
    			"end_offset": 3,
    			"type": "<IDEOGRAPHIC>",
    			"position": 2
    		},
    		{
    			"token": "个",
    			"start_offset": 3,
    			"end_offset": 4,
    			"type": "<IDEOGRAPHIC>",
    			"position": 3
    		},
    		{
    			"token": "兵",
    			"start_offset": 4,
    			"end_offset": 5,
    			"type": "<IDEOGRAPHIC>",
    			"position": 4
    		}
    	]
    }
    

      

  • 相关阅读:
    SlimDX.dll安装之后所在位置
    使用正则表达式进行简单查找
    UDP-C#代码
    非Unicode工程读取Unicode文件
    模板类重载<<运算符
    ganglia及ganglia-api相关介绍
    keystone v3 相关介绍
    ubuntu下ssh使用proxy:corkscrew
    neutron用linux_bridge部署provider网络
    python thread的join方法解释
  • 原文地址:https://www.cnblogs.com/zhanghaibin16/p/13878805.html
Copyright © 2011-2022 走看看