zoukankan      html  css  js  c++  java
  • hivesql学习笔记之开窗函数

    Rank() 排序相同时会重复,总数不会变
    DENSE_RANK() 排序相同时会重复,总数会减少
    ROW_NUMBER() 会根据顺序计算

    实验数据

    cookieid        creattime           pv

    cookie1,   2017-12-10,    1
    cookie1,   2017-12-11,    5
    cookie1,   2017-12-12,    7
    cookie1,   2017-12-13,    3
    cookie1,   2017-12-14,    2
    cookie1,   2017-12-15,    4
    cookie1,   2017-12-16,    4

    cookie2,   2017-12-12,    7
    cookie2,   2017-12-16,    6
    cookie2,   2017-12-24,    1

    cookie3,   2017-12-22,    5

    a,        2017-12-01,         3
    b,                  2017-12-00,         3

    实验一:

    SELECT
    cookieid,
    creattime,
    pv,
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime) AS pv1, -- 默认为从起点到当前行的pv和
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2 --从起点到当前行,结果同pv1
    row_num() OVER(PARTITION BY cookieid ORDER BY creattime) AS rn
    FROM
    dim.test_stu_info_study;

    实验结果:

                                                  pv     pv1      pv2    rn

    cookie3   2017-12-22   5   5   5   1
    cookie1   2017-12-10   1   1   1   1
    cookie1   2017-12-11   5   6   6    2
    cookie1   2017-12-12   7   13  13   3
    cookie1   2017-12-13   3   16  16   4
    cookie1   2017-12-14   2   18  18   5
    cookie1   2017-12-15   4   22  22   6  (即使一样,也顺序排序)
    cookie1   2017-12-16   4   26  26   7
    b        2017-12-00   3   3   3   1
    cookie2   2017-12-12   7   7   7   1
    cookie2   2017-12-16   6   13 13         2
    cookie2   2017-12-24   1   14    14   3
    a        2017-12-01   3    3      3    1

    实验二:

    SELECT
    cookieid,
    creattime,
    pv,
    AVG(pv) OVER(PARTITION BY cookieid ORDER BY creattime) AS pv1, -- 默认为从起点到当前行的pv和 /pv的个数
    AVG(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2 --从起点到当前行,结果同pv1
    row_num() OVER(PARTITION BY cookieid ORDER BY creattime) AS rn
    FROM
      dim.test_stu_info_study;

    结果:

                                             pv       pv1       pv2       rn

    cookie3    2017-12-22   5   5.0   5.0   1
    cookie1   2017-12-10   1   1.0   1.0   1
    cookie1   2017-12-11   5   3.0   3.0   2
    cookie1   2017-12-12   7   4.33   4.33  3
    cookie1   2017-12-13   3   4.0   4.0   4
    cookie1   2017-12-14   2   3.6   3.6   5
    cookie1   2017-12-15   4   3.66   3.66       6
    cookie1   2017-12-16   4   3.71     3.71        7
    b        2017-12-00   3   3.0   3.0   1
    cookie2   2017-12-12   7   7.0   7.0   1
    cookie2   2017-12-16   6   6.5   6.5   2
    cookie2   2017-12-24   1   4.66   4.66       3
    a        2017-12-01   3   3.0   3.0   1

    实验三:

    SELECT
    cookieid,
    creattime,
    pv,
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime) AS pv1, -- 默认为从起点到当前行的pv和
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv3, --当前行pv+往前3行pv的值(共四行pv的值相加)
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv4, --当前行+往前3行+往后1行(当前行的pv值+往前三行的pv值+当前行往后一行的pv值,相当于共5行pv值的和
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv5, ---当前行+往后所有行 (相当于第一行是所有行pv的总值,pv的值是逐渐减少的)
    row_number() OVER(PARTITION BY cookieid ORDER BY creattime) AS rn
    FROM
    dim.test_stu_info_study;

    实验三结果:

                                                pv     pv1   pv2    pv3   pv4    pv5     rn

    cookie3   2017-12-22   5   5   5   5   5   5     1
    cookie1   2017-12-10   1   1   1   1   6   26   1
    cookie1   2017-12-11   5   6    6   6  13    25   2
    cookie1   2017-12-12   7   13   13    13    16    20   3
    cookie1   2017-12-13   3   16   16    16    18    13   4
    cookie1   2017-12-14   2   18   18    17     21   10   5
    cookie1   2017-12-15   4   22   22    16   20    8    6
    cookie1   2017-12-16   4   26   26    13   13    4    7
    b        2017-12-00   3   3   3    3     3     3    1
    cookie2   2017-12-12   7   7   7    7    13   14   1
    cookie2   2017-12-16   6   13  13    13    14   7        2
    cookie2   2017-12-24   1   14   14   14   14    1        3
    a      2017-12-01   3   3   3    3     3   3   1

    说明:

    窗口函数和聚合函数的不同,

    sum()函数可以根据每一行的窗口返回各自行对应的值,有多少行记录就有多少个sum值,

    而group by只能计算每一组的sum,每组只有一个值!

    其中sum()计算的是分区内排序后一个个叠加的值,和order by有关!

    如果没有order by,不仅分区内没有排序,sum()计算的pv也是整个分区的pv

    注:max()函数无论有没有order by 都是计算整个分区的最大值

    实验四:

    SELECT
    cookieid,
    creattime,
    pv,
    RANK() OVER(PARTITION BY cookieid ORDER BY pv) AS pv2, --从起点到当前行,结果同pv1
    DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv ) AS pv3, --当前行+往前3行
    row_number() OVER(PARTITION BY cookieid ORDER BY pv) AS rn
    FROM
    dim.test_stu_info_study;

    实验四结果:

                                             pv   pv1   pv2   rn

    cookie3   2017-12-22   5   1   1   1
    cookie1   2017-12-10   1   1   1   1
    cookie1   2017-12-14   2   2   2   2
    cookie1   2017-12-13   3   3   3   3
    cookie1   2017-12-15   4   4   4   4
    cookie1   2017-12-16   4   4   4   5
    cookie1   2017-12-11   5   6   5   6
    cookie1   2017-12-12   7   7   6   7
    b        2017-12-00   3   1   1   1
    cookie2   2017-12-24   1   1   1   1
    cookie2   2017-12-16   6   2   2   2
    cookie2   2017-12-12   7   3   3   3
    a      2017-12-01   3   1   1   1

     

    开窗函数和聚合函数区别

    select ename,sal,sum(sal) over (partition by ename order by sal,empno) as running_total
    from emp1
    order by 2

    按ename 汇总sal


    over()开窗函数和聚合函数的不同之处是对于每个组返回多行,而聚合函数对于每个组只返回一行。


    SQL> select e.empno,e.ename,e.job,e.sal,e.deptno, sum(e.sal) over (partition by e.deptno) as total_sal
      2  from emp e;

         EMPNO ENAME      JOB              SAL     DEPTNO  TOTAL_SAL
    ---------- ---------- --------- ---------- ---------- ----------
          7782 CLARK      MANAGER         2450         10       8750
          7839 KING       PRESIDENT       5000         10       8750
          7934 MILLER     CLERK           1300         10       8750
          7566 JONES      MANAGER         2975         20      10875
          7902 FORD       ANALYST         3000         20      10875
          7876 ADAMS      CLERK           1100         20      10875
          7369 SMITH      CLERK            800         20      10875
          7788 SCOTT      ANALYST         3000         20      10875
          7521 WARD       SALESMAN        1250         30       9400
          7844 TURNER     SALESMAN        1500         30       9400
          7499 ALLEN      SALESMAN        1600         30       9400

         EMPNO ENAME      JOB              SAL     DEPTNO  TOTAL_SAL
    ---------- ---------- --------- ---------- ---------- ----------
          7900 JAMES      CLERK            950         30       9400
          7698 BLAKE      MANAGER         2850         30       9400
          7654 MARTIN     SALESMAN        1250         30       9400

    已选择14行。


    聚合函数:
    SQL> select sum(sal) ,deptno from emp group by deptno;

      SUM(SAL)     DEPTNO
    ---------- ----------
          9400         30
         10875         20
          8750         10

  • 相关阅读:
    Spring如何处理线程并发问题?
    什么是spring?
    如何通过sql语句完成分页?
    哪一个List实现了最快插入?
    请说出作用域public,private,protected,以及不写时的区别?
    使用什么命令查看用过的命令列表?
    静态变量和实例变量的区别?
    使用什么命令查看磁盘使用空间? 空闲空间呢?
    什么是 Mybatis?
    是否可以继承String类?
  • 原文地址:https://www.cnblogs.com/pengpenghuhu/p/11713158.html
Copyright © 2011-2022 走看看