zoukankan      html  css  js  c++  java
  • [Hive

     
    Skip to end of metadata
     
    Go to start of metadata
     

    Windowing and Analytics Functions

    Enhancements to Hive QL

    Version

    Icon

    Introduced in Hive version 0.11.

    This section introduces the Hive QL enhancements for windowing and analytics functions. See "Windowing Specifications in HQL" (attached to HIVE-4197) for details. HIVE-896 has more information, including links to earlier documentation in the initial comments.

    All of the windowing and analytics functions operate as per the SQL standard.

    The current release supports the following functions for windowing and analytics:

    1. Windowing functions
      • LEAD
        • The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
        • Returns null when the lead for the current row extends beyond the end of the window.
      • LAG
        • The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
        • Returns null when the lag for the current row extends before the beginning of the window.
      • FIRST_VALUE
      • LAST_VALUE
    2. The OVER clause
      • OVER with standard aggregates:
        • COUNT
        • SUM
        • MIN
        • MAX
        • AVG
      • OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
      • OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
        • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:

          ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
          
          Icon

          The OVER clause supports the following functions, but it does not support a window with them (see HIVE-4797):

          Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.

          Lead and Lag functions.

    3. Analytics functions
      • RANK
      • ROW_NUMBER
      • DENSE_RANK
      • CUME_DIST
      • PERCENT_RANK
      • NTILE

    Examples

    This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.

    PARTITION BY with one partitioning column, no ORDER BY or window specification

    SELECT a, COUNT(b) OVER (PARTITION BY c)
    FROM T;

    PARTITION BY with two partitioning columns, no ORDER BY or window specification

    SELECT a, COUNT(b) OVER (PARTITION BY c, d)
    FROM T;

    PARTITION BY with one partitioning column, one ORDER BY column, and no window specification

    SELECT a, SUM(b) OVER (PARTITION BY ORDER BY d)
    FROM T;

    PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification

    SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
    FROM T;

    PARTITION BY with partitioning, ORDER BY, and window specification

    SELECT a, SUM(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    FROM T;
    SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
    FROM T;
    SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
    FROM T;
    SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
    FROM T;

    There can be multiple OVER clauses in a single query. A single OVER clause only applies to the immediately preceding function call. In this example, the first OVER clause applies to COUNT(b) and the second OVER clause applies to SUM(b):

    SELECT 
     a,
     COUNT(b) OVER (PARTITION BY c),
     SUM(b) OVER (PARTITION BY c)
    FROM T;

    Aliases can be used as well, with or without the keyword AS:

    SELECT 
     a,
     COUNT(b) OVER (PARTITION BY c) AS b_count,
     SUM(b) OVER (PARTITION BY c) b_sum
    FROM T;

    WINDOW clause

    SELECT a, SUM(b) OVER w
    FROM T;
    WINDOW w AS (PARTITION BY ORDER BY ROWS UNBOUNDED PRECEDING)

    LEAD using default 1 row lead and not specifying default value

    SELECT a, LEAD(a) OVER (PARTITION BY ORDER BY ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
    FROM T;

    LAG specifying a lag of 3 rows and default value of 0

    SELECT a, LAG(a, 3, 0) OVER (PARTITION BY ORDER BY ROWS 3 PRECEDING)
    FROM T;
     
    谨言慎行,专注思考 , 工作与生活同乐
  • 相关阅读:
    requests
    urllib
    爬虫初入
    Django之admin
    Django之FileField字段
    边学边练之博客园----登录验证
    边学边练之博客园----设计表
    Django之中间件
    计算机基础
    win—命令行窗口一层一层的打开文件
  • 原文地址:https://www.cnblogs.com/tmeily/p/4249962.html
Copyright © 2011-2022 走看看