zoukankan      html  css  js  c++  java
  • elasticsearchdsl聚合2

    接续上篇,本篇介绍elasticsearch聚合查询,使用python库elasticsearch-dsl进行聚合查询操作。

    条形图

    聚合有一个令人激动的特性就是能够十分容易地将数据转换成图表和图形。

      • 创建直方图需要指定一个区间,如果我们要为售价创建一个直方图,可以将间隔设为 20,000。这样做将会在每个 $20,000 档创建一个新桶,然后文档会被分到对应的桶中。
         1 GET cars/transactions/_search
         2 {
         3   "size": 0,
         4   "aggs": {
         5     "price": {
         6       "histogram": {
         7         "field": "price",
         8         "interval": 20000
         9       },
        10       "aggs": {
        11         "revenue": {
        12           "sum": {
        13             "field": "price"
        14           }
        15         }
        16       }
        17     }
        18   }
        19 }
        1 s = Search(index='cars')
        2 s.aggs.bucket("price", "histogram", field="price", interval=20000).metric("revenue", "sum", field="price")
        3 response = s.execute()

        图形化表示

      • 更强大的统计
         1 GET /cars/transactions/_search
         2 {
         3   "size" : 0,
         4   "aggs": {
         5     "makes": {
         6       "terms": {
         7         "field": "make",
         8         "size": 10
         9       },
        10       "aggs": {
        11         "stats": {
        12           "extended_stats": {
        13             "field": "price"
        14           }
        15         }
        16       }
        17     }
        18   }
        19 }
        1 s = Search(index='cars')
        2 s.aggs.bucket("makes", "terms", field="make", size=10).metric("stats", "extended_stats", field="price")
        3 response = s.execute()
      • 按时间统计(date_histogram),每月销售了多少台汽车?
         1 GET cars/transactions/_search
         2 {
         3   "size": 0,
         4   "aggs": {
         5     "sales": {
         6       "date_histogram": {
         7         "field": "sold",
         8         "interval": "month",
         9         "format": "yyyy-MM-dd",
        10         "extended_bounds": {
        11           "min": "2014-01-01",
        12           "max": "2014-12-31"
        13         }
        14       }
        15     }
        16   }
        17 }
        1 s = Search(index='cars')
        2 s.aggs.bucket("sales", "date_histogram", field="sold", interval="month",
        3                   format="yyyy-MM-dd", extended_bounds={"min": "2014-01-01", "max": "2014-12-31"})
        4 response = s.execute()
      • 计算每个季度所有汽车品牌的销售总额以及每种汽车品牌的销售总额
         1 GET cars/transactions/_search
         2 {
         3   "size": 0,
         4   "aggs": {
         5     "sales": {
         6       "date_histogram": {
         7         "field": "sold",
         8         "interval": "quarter",
         9         "format": "yyyy-MM-dd",
        10         "extended_bounds": {
        11           "min": "2014-01-01",
        12           "max": "2014-12-31"
        13         }
        14       },
        15       "aggs": {
        16         "per_make_sum": {
        17           "terms": {
        18             "field": "make"
        19           },
        20           "aggs": {
        21             "sum_price": {
        22               "sum": {
        23                 "field": "price"
        24               }
        25             }
        26           }
        27         },
        28         "total_sum": {
        29           "sum": {
        30             "field": "price"
        31           }
        32         }
        33       }
        34     }
        35   }
        36 }
        1 s = Search(index='cars')
        2 a1 = A("date_histogram", field="sold", interval="quarter", format="yyyy-MM-dd",
        3            extended_bounds={"min": "2014-01-01", "max": "2014-12-31"})
        4 a2 = A("terms", field="make")
        5 s.aggs.bucket("sales", a1).bucket("per_make_sum", a2).metric("sum_price", "sum", field="price")
        6 s.aggs["sales"].metric("total_sum", "sum", field="price")
        7 response = s.execute()
      • 限定范围的聚合,福特在售车有多少种颜色?
         1 GET cars/transactions/_search
         2 {
         3   "query": {
         4     "match": {
         5       "make": "ford"
         6     }
         7   },
         8   "aggs": {
         9     "colors": {
        10       "terms": {
        11         "field": "make"
        12       }
        13     }
        14   }
        15 }
        1 s = Search(index="cars").query("match", make="ford")
        2 s.aggs.bucket("colors", "terms", field="make")
        3 response = s.execute()
      • 全局桶(全局桶包含所有的文档,它无视查询的范围),比方说我们想知道福特汽车与所有汽车平均售价的比较
         1 GET cars/transactions/_search
         2 {
         3   "query": {
         4     "match": {
         5       "make": "ford"
         6     }
         7   },
         8   "aggs": {
         9     "single_avg_price": {
        10       "avg": {
        11         "field": "price"
        12       }
        13     },
        14     "all": {
        15       "global": {},         --global忽略过滤条件
        16       "aggs": {
        17         "avg_price": {
        18           "avg": {
        19             "field": "price"
        20           }
        21         }
        22       }
        23     }
        24   }
        25 }
        1 s = Search(index="cars").query("match", make="ford")
        2 s.aggs.metric("single_avg_price", "avg", field="price")
        3 s.aggs.bucket("all", "global").metric("avg_price", "avg", field="price")
        4 response = s.execute()
      • 过滤,找到售价在 $10,000 美元之上的所有汽车同时也为这些车计算平均售价
         1 GET cars/transactions/_search
         2 {
         3   "query": {
         4     "constant_score": {
         5       "filter": {
         6         "range": {
         7           "price": {
         8             "gte": 10000
         9           }
        10         }
        11       }
        12     }
        13   },
        14   "aggs": {
        15     "single_avg_price": {
        16       "avg": {
        17         "field": "price"
        18       }
        19     }
        20   }
        21 }
        1 s = Search(index="cars").query("range", price={"gte": 10000})
        2 s.aggs.metric("single_avg_price", "avg", field="price")
        3 response = s.execute()
      • 过滤桶(一种特殊桶),搜索福特汽车在2014年上半年销售汽车的均价
         1 GET /cars/transactions/_search
         2 {
         3    "size" : 0,
         4    "query":{
         5       "match": {
         6          "make": "ford"
         7       }
         8    },
         9    "aggs":{
        10       "recent_sales": {
        11          "filter": { 
        12             "range": {
        13                "sold": {
        14                   "from": "2014-01-01",
        15                   "to": "2014-06-30"
        16                }
        17             }
        18          },
        19          "aggs": {
        20             "average_price":{
        21                "avg": {
        22                   "field": "price" 
        23                }
        24             }
        25          }
        26       }
        27    }
        28 }
        1 s = Search(index="cars").query("match", make="ford")
        2 q = Q("range", sold={"from": "2014-01-01", "to": "2014-06-30"})
        3 s.aggs.bucket("recent_sales", "filter", q).metric("average_price", "avg", field="price")
        4 response = s.execute()
      • 后过滤器(post_filter),只过滤搜索结果,不过滤聚合结果,对聚合没有影响
         1 GET cars/transactions/_search
         2 {
         3   
         4   "query": {
         5     "match": {
         6       "make": "ford"
         7     }
         8   },
         9   "post_filter": {
        10     "term": {
        11       "color": "green"
        12     }
        13   },
        14   "aggs": {
        15     "all_colors": {
        16       "terms": {
        17         "field": "color"
        18       }
        19     }
        20   }
        21 }
        1 s = Search(index="cars").query("match", make="ford").post_filter("term", color="green")
        2 s.aggs.bucket("all_colors", "terms", field="color")
        3 response = s.execute()

    内置排序

    • _count:按文档数排序。对 terms 、 histogram 、 date_histogram 有效
    • _term:按词项的字符串值的字母顺序排序。只在 terms 内使用
    • _key:按每个桶的键值数值排序(理论上与 _term 类似)。 只在 histogram 和 date_histogram 内使用
      • 让我们做一个 terms 聚合但是按 doc_count 值的升序排序
         1 GET cars/transactions/_search
         2 {
         3   "size": 0,
         4   "aggs": {
         5     "colors": {
         6       "terms": {
         7         "field": "color",
         8         "order": {
         9           "_count": "asc"
        10         }
        11       }
        12     }
        13   }
        14 }
        1 s = Search(index="cars")
        2 s.aggs.bucket("colors", "terms", field="color", order={"_count": "asc"})
        3 response = s.execute()
      • 按度量排序,按照汽车颜色分类,再按照汽车平均售价升序排列
         1 GET cars/transactions/_search
         2 {
         3   "size": 0,
         4   "aggs": {
         5     "colors": {
         6       "terms": {
         7         "field": "color",
         8         "order": {
         9           "avg_price": "asc"
        10         }
        11       },
        12       "aggs": {
        13         "avg_price": {
        14           "avg": {
        15             "field": "price"
        16           }
        17         }
        18       }
        19     }
        20   }
        21 }
        1 s = Search(index="cars")
        2 s.aggs.bucket("colors", "terms", field="color", order={"avg_price": "asc"}).metric("avg_price", "avg", field="price")
        3 response = s.execute()
      • 基于“深度”度量排序

    我们可以定义更深的路径,将度量用尖括号( > )嵌套起来,像这样: my_bucket>another_bucket>metric

    需要提醒的是嵌套路径上的每个桶都必须是 单值 的。 filter 桶生成 一个单值桶:所有与过滤条件匹配的文档都在桶中。 多值桶(如:terms )动态生成许多桶,无法通过指定一个确定路径来识别。

    目前,只有三个单值桶: filter global reverse_nested

      • 让我们快速用示例说明,创建一个汽车售价的直方图,但是按照红色和绿色(不包括蓝色)车各自的方差来排序
         1 GET /cars/transactions/_search
         2 {
         3     "size" : 0,
         4     "aggs" : {
         5         "colors" : {
         6             "histogram" : {
         7               "field" : "price",
         8               "interval": 20000,
         9               "order": {
        10                 "red_green_cars>stats.variance" : "asc" 
        11               }
        12             },
        13             "aggs": {
        14                 "red_green_cars": {
        15                     "filter": { "terms": {"color": ["red", "green"]}}, 
        16                     "aggs": {
        17                         "stats": {"extended_stats": {"field" : "price"}} 
        18                     }
        19                 }
        20             }
        21         }
        22     }
        23 }
        1 s = Search(index="cars")
        2 a = A("histogram", field="price", interval=20000, order={"red_green_cars>stats.variance": "asc"})
        3 q = A("filter", filter={"terms": {"color": ["red", "green"]}})
        4 s.aggs.bucket("colors", a).bucket("red_green_cars", q).metric("stats", "extended_stats", field="price")
        5 response = s.execute()

     

     

  • 相关阅读:
    nrm安装与配置
    vue-cli2升级到webpack4
    npm和yarn的区别
    Vite和Vue CLI的优劣
    基于vue+element+select实现的自定义控件selectTree
    Extjs 隐藏tabpanel的header / title
    await/async 总结
    泛型
    jpg图片打包生成pdf文件的几种方式
    varchar(100)和varchar(10)的区别
  • 原文地址:https://www.cnblogs.com/dowi/p/10101185.html
Copyright © 2011-2022 走看看