zoukankan      html  css  js  c++  java
  • Flink基础(123):FLINK-SQL语法 (17) DQL(9) OPERATIONS(6) 窗口 (4)Over Aggregation

    0 Over Aggregation(简介)

    Batch Streaming

    OVER aggregates compute an aggregated value for every input row over a range of ordered rows. In contrast to GROUP BY aggregates, OVER aggregates do not reduce the number of result rows to a single row for every group. Instead OVER aggregates produce an aggregated value for every input row.

    The following query computes for every order the sum of amounts of all orders for the same product that were received within one hour before the current order.

    SELECT order_id, order_time, amount,
      SUM(amount) OVER (
        PARTITION BY product
        ORDER BY order_time
        RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
      ) AS one_hour_prod_amount_sum
    FROM Orders

    The syntax for an OVER window is summarized below.

    SELECT
      agg_func(agg_col) OVER (
        [PARTITION BY col1[, col2, ...]]
        ORDER BY time_col
        range_definition),
      ...
    FROM ...

    You can define multiple OVER window aggregates in a SELECT clause. However, for streaming queries, the OVER windows for all aggregates must be identical due to current limitation.

    1 ORDER BY 

    OVER windows are defined on an ordered sequence of rows. Since tables do not have an inherent order, the ORDER BYclause is mandatory. For streaming queries, Flink currently only supportsOVER` windows that are defined with an ascending time attributes order. Additional orderings are not supported.

    2 PARTITION BY 

    OVER windows can be defined on a partitioned table. In presence of a PARTITION BY clause, the aggregate is computed for each input row only over the rows of its partition.

    3 Range Definitions 

    The range definition specifies how many rows are included in the aggregate. The range is defined with a BETWEEN clause that defines a lower and an upper boundary. All rows between these boundaries are included in the aggregate. Flink only supports CURRENT ROW as the upper boundary.

    There are two options to define the range, ROWS intervals and RANGE intervals.

    3.1 RANGE intervals

    RANGE interval is defined on the values of the ORDER BY column, which is in case of Flink always a time attribute. The following RANGE interval defines that all rows with a time attribute of at most 30 minutes less than the current row are included in the aggregate.

    RANGE BETWEEN INTERVAL '30' MINUTE PRECEDING AND CURRENT ROW

    3.2 ROW intervals 

    ROWS interval is a count-based interval. It defines exactly how many rows are included in the aggregate. The following ROWS interval defines that the 10 rows preceding the current row and the current row (so 11 rows in total) are included in the aggregate.

    ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
    WINDOW

    The WINDOW clause can be used to define an OVER window outside of the SELECT clause. It can make queries more readable and also allows us to reuse the window definition for multiple aggregates.

    SELECT order_id, order_time, amount,
      SUM(amount) OVER w AS sum_amount,
      AVG(amount) OVER w AS avg_amount
    FROM Orders
    WINDOW w AS (
      PARTITION BY product
      ORDER BY order_time
      RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW)

    本文来自博客园,作者:秋华,转载请注明原文链接:https://www.cnblogs.com/qiu-hua/p/15195541.html

  • 相关阅读:
    Vue使用watch监听一个对象中的属性
    小程序 显示对话框 确定-取消
    【微信小程序】 wx:if 与 hidden(隐藏元素)区别
    vue项目移植tinymce踩坑
    XMLHttpRequest.withCredentials 解决跨域请求头无Cookie的问题
    appJSON["window"]["navigationBarTextStyle"] 字段需为 black 或 white
    Java写 插入 选择 冒泡 快排
    编码表理解
    Centos yum安装java jdk1.8
    Java Hibernate和.Net EntityFramework 如何在提交事务之前 就拿到需要新增实体的Id
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/15195541.html
Copyright © 2011-2022 走看看