zoukankan      html  css  js  c++  java
  • Elasticsearch的聚合操作

    ES的聚合:

    Metrics简单的对过滤出来的数据集进行avg,max等操作,是一个单一的数值。
    bucket 可以理解为将过滤出来的数据集按条件分成多个小数据集,然后Metrics会分别作用在这些小数据集上

    metric很像SQL中的avg、max、min等方法,而bucket就有点类似group by

    导入数据汽车销售数据:

    curl -XPOST http://hadoop01:9200/cars/transactions/_bulk -d '
    { "index": {}}
    { "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
    { "index": {}}
    { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
    { "index": {}}
    { "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
    { "index": {}}
    { "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
    { "index": {}}
    { "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
    { "index": {}}
    { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
    { "index": {}}
    { "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
    { "index": {}}
    { "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
    '

    1:Bucket

    1.1:按时间统计(date_histogram时间直方图聚合)

    date_histogram是专门用来给时间格式的数据进行聚合的

    时间字段sold按照月份统计:

    curl -XGET 'hadoop01:9200/cars/transactions/_search?pretty' -d '
    {
       "aggs" : {
           "agg_time" : {   #给聚合的字段起名字
               "date_histogram" : {  #聚合方式,区间上支持了日期的表达式

                   "field" : "sold",
                   "interval": "month"
                }
            }
      }
    }'

    1.2:返回价格区间柱形图(histogram直方图聚合)

    统计区间的price值,看他落在那个区间,数据间隔是5000

    curl -XGET 'hadoop01:9200/cars/transactions/_search?pretty' -d '
    {
       "aggs" : {
           "prices" : {
               "histogram" : {
                   "field" : "price",
                   "interval" : 5000
              }
          }
      }
    }'

    1.3:查看每种颜色的销量

    curl -XPUT 'hadoop01:9200/cars/_mapping/transactions' -d  '  
    {
      "properties": {
        "color": {
          "type": "text",
          "fielddata": true
        }
      }
    }'

    Fielddata:会把字段加载到内存中,在聚合的时候,通过内存可以找到这个字段,否则聚合出错(Fielddata缓存数据的大小是无限制的,不要把无关的数据也缓存起来

    GET /cars/transactions/_search?pretty
    {

       "aggs" : {

           "agg-color" : {     #聚合的名称(自定义)

               "terms" : { "field" : "color" }

          }

      }

    }

    2:Metric

    metric很像SQL中的avg``、max、min等方法,而bucket就有点类似group by

    metric的聚合按照值的返回类型可以分为两种:单值聚合 和 多值聚合

    2.1:单值聚合

    2.1.1: sum求和

    例子:求cars索引中,所有汽车订单的销售总额

    curl -XGET 'hadoop01:9200/cars/transactions/_search?pretty' -d '
    {
       "aggs" : {
           "genres" : {
               "sum" : { "field" : "price" }
          }
      }
    }'
    2.1.2: Min最小值

    例子:求cars索引中price值最小的

    curl -XGET 'hadoop01:9200/cars/transactions/_search?pretty' -d '
    {
       "aggs" : {
           "genres" : {
               "min" : { "field" : "price" }
          }
      }
    }'
    2.1.3: max最大值

    求cars索引中price的最大值:

    curl -XGET 'hadoop01:9200/cars/transactions/_search?pretty' -d '
    {
       "aggs" : {
           "genres" : {
               "max" : { "field" : "price" }
          }
      }
    }'
    2.1.4: avg求平均值

    求cars索引中price价格的平均值

    curl -XGET 'hadoop01:9200/cars/transactions/_search?pretty' -d '
    {
       "aggs" : {
           "genres" : {
               "avg" : { "field" : "price" }
          }
      }
    }'

    3:多值聚合

    导入数据:

    curl -XPOST http://hadoop01:9200/sanguo/dahan/_bulk -d '
    { "index": {}}
    { "studentNo" : 1, "name" : "刘备", "male" : "男", "age" : 24 , "birthday" : "1985-02-03" , "classNo" : 1 , "address" : "湖南省长沙市" , "isLeader" : true}
    { "index": {}}
    { "studentNo" : 2, "name" : "关羽", "male" : "男", "age" : 22 , "birthday" : "1987-08-23" , "classNo" : 2, "address" : "四川省成都市" , "isLeader" : false}
    { "index": {}}
    { "studentNo" : 3, "name" : "糜夫人", "male" : "女", "age" : 19 , "birthday" : "1990-06-12" , "classNo" : 1 , "address" : "上海市" , "isLeader" : false}
    { "index": {}}
    { "studentNo" : 4, "name" : "张飞", "male" : "男", "age" : 20 , "birthday" : "1989-07-30" , "classNo" : 3 , "address" : "北京市" , "isLeader" : false}
    { "index": {}}
    { "studentNo" : 5, "name" : "诸葛亮", "male" : "男", "age" : 18 , "birthday" : "1992-04-27" , "classNo" : 2 , "address" : "江苏省南京市" , "isLeader" : true}
    { "index": {}}
    { "studentNo" : 6, "name" : "孙尚香", "male" : "女", "age" : 16 , "birthday" : "1994-05-21" , "classNo" : 3 , "address" : "广东省深圳市" , "isLeader" : false}
    { "index": {}}
    { "studentNo" : 7, "name" : "马超", "male" : "男", "age" : 19 , "birthday" : "1991-10-20" , "classNo" : 1 , "address" : "黑龙江省哈尔滨市" , "isLeader" : false}
    { "index": {}}
    { "studentNo" : 8, "name" : "赵云", "male" : "男", "age" : 23 , "birthday" : "1986-10-26 " , "classNo" : 2 , "address" : "浙江省杭州市" , "isLeader" : false}
    '

    3.1:stats 统计

    统计查询,一次性统计出某个字段上的常用统计值

    curl -XPOST "hadoop01:9200/sanguo/dahan/_search?pretty" -d '
    {
     "aggs": {
       "stats_age": {
         "stats": {
           "field": "age"
        }
      }
    }
    }
    '

    Stats可以把min、max、avg、sum全部展现出来

    3.2:Top hits Aggregation

    取符合条件的前n条数据记录,就是SQL中所谓的分组取topN操作

    例子:查询sanguo索引中年龄age前3名的姓名和年龄

     

    select name , age from table order by age desc limit 3

    curl -XPOST "hadoop01:9200/sanguo/dahan/_search?pretty" -d '
    {
      "aggs": {
        "top_age": {
          "top_hits": {
            "sort": [               
              {
                "age": {            
                  "order": "desc"
                }
              }],
            "_source": {
              "include": [           
                "name",
                "age"
              ]
            },
            "size": 3             
          }
        }
      }
    }
    '
    

     

    3.3:嵌套使用

    聚合操作是可以嵌套使用的, 通过嵌套,可以使得metric类型的聚合操作作用在每一“桶”上。我们可以使用ES的嵌套聚合操作来完成稍微复杂一点的统计功能

    例如:查询sanguo索引中每个classNo中年龄最大的

    select name , age from table gruop by classNo order by age limit 1

    curl -XPOST "hadoop01:9200/sanguo/dahan/_search?pretty" -d '
    {
      "aggs": {
        "m": {
          "terms": {
            "field": "classNo"
          },
          "aggs": {                 
            "max_age": {
              "max": {              
                "field": "age"
              }
            }
          }
        }
      }
    }
    '

  • 相关阅读:
    Docker02 Docker初识:第一个Docker容器和Docker镜像
    Docker01 CentOS配置Docker
    Jenkins02:Jenkins+maven+svn集成
    Junit01 新建Maven项目
    Junit02 Junit创建及简单实现
    Jenkins01:linux+jenkins+ant+jmeter集成
    Jenkins初识03:构建定时任务
    python 协程
    python之socket 网络编程
    python 面向对象
  • 原文地址:https://www.cnblogs.com/niutao/p/10909121.html
Copyright © 2011-2022 走看看