Rank() 排序相同时会重复,总数不会变
DENSE_RANK() 排序相同时会重复,总数会减少
ROW_NUMBER() 会根据顺序计算
实验数据
cookieid creattime pv
cookie1, 2017-12-10, 1
cookie1, 2017-12-11, 5
cookie1, 2017-12-12, 7
cookie1, 2017-12-13, 3
cookie1, 2017-12-14, 2
cookie1, 2017-12-15, 4
cookie1, 2017-12-16, 4
cookie2, 2017-12-12, 7
cookie2, 2017-12-16, 6
cookie2, 2017-12-24, 1
cookie3, 2017-12-22, 5
a, 2017-12-01, 3
b, 2017-12-00, 3
实验一:
SELECT
cookieid,
creattime,
pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime) AS pv1, -- 默认为从起点到当前行的pv和
SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2 --从起点到当前行,结果同pv1
row_num() OVER(PARTITION BY cookieid ORDER BY creattime) AS rn
FROM
dim.test_stu_info_study;
实验结果:
pv pv1 pv2 rn
cookie3 2017-12-22 5 5 5 1
cookie1 2017-12-10 1 1 1 1
cookie1 2017-12-11 5 6 6 2
cookie1 2017-12-12 7 13 13 3
cookie1 2017-12-13 3 16 16 4
cookie1 2017-12-14 2 18 18 5
cookie1 2017-12-15 4 22 22 6 (即使一样,也顺序排序)
cookie1 2017-12-16 4 26 26 7
b 2017-12-00 3 3 3 1
cookie2 2017-12-12 7 7 7 1
cookie2 2017-12-16 6 13 13 2
cookie2 2017-12-24 1 14 14 3
a 2017-12-01 3 3 3 1
实验二:
SELECT
cookieid,
creattime,
pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY creattime) AS pv1, -- 默认为从起点到当前行的pv和 /pv的个数
AVG(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2 --从起点到当前行,结果同pv1
row_num() OVER(PARTITION BY cookieid ORDER BY creattime) AS rn
FROM
dim.test_stu_info_study;
结果:
pv pv1 pv2 rn
cookie3 2017-12-22 5 5.0 5.0 1
cookie1 2017-12-10 1 1.0 1.0 1
cookie1 2017-12-11 5 3.0 3.0 2
cookie1 2017-12-12 7 4.33 4.33 3
cookie1 2017-12-13 3 4.0 4.0 4
cookie1 2017-12-14 2 3.6 3.6 5
cookie1 2017-12-15 4 3.66 3.66 6
cookie1 2017-12-16 4 3.71 3.71 7
b 2017-12-00 3 3.0 3.0 1
cookie2 2017-12-12 7 7.0 7.0 1
cookie2 2017-12-16 6 6.5 6.5 2
cookie2 2017-12-24 1 4.66 4.66 3
a 2017-12-01 3 3.0 3.0 1
实验三:
SELECT
cookieid,
creattime,
pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime) AS pv1, -- 默认为从起点到当前行的pv和
SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv3, --当前行pv+往前3行pv的值(共四行pv的值相加)
SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv4, --当前行+往前3行+往后1行(当前行的pv值+往前三行的pv值+当前行往后一行的pv值,相当于共5行pv值的和)
SUM(pv) OVER(PARTITION BY cookieid ORDER BY creattime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv5, ---当前行+往后所有行 (相当于第一行是所有行pv的总值,pv的值是逐渐减少的)
row_number() OVER(PARTITION BY cookieid ORDER BY creattime) AS rn
FROM
dim.test_stu_info_study;
实验三结果:
pv pv1 pv2 pv3 pv4 pv5 rn
cookie3 2017-12-22 5 5 5 5 5 5 1
cookie1 2017-12-10 1 1 1 1 6 26 1
cookie1 2017-12-11 5 6 6 6 13 25 2
cookie1 2017-12-12 7 13 13 13 16 20 3
cookie1 2017-12-13 3 16 16 16 18 13 4
cookie1 2017-12-14 2 18 18 17 21 10 5
cookie1 2017-12-15 4 22 22 16 20 8 6
cookie1 2017-12-16 4 26 26 13 13 4 7
b 2017-12-00 3 3 3 3 3 3 1
cookie2 2017-12-12 7 7 7 7 13 14 1
cookie2 2017-12-16 6 13 13 13 14 7 2
cookie2 2017-12-24 1 14 14 14 14 1 3
a 2017-12-01 3 3 3 3 3 3 1
说明:
窗口函数和聚合函数的不同,
sum()函数可以根据每一行的窗口返回各自行对应的值,有多少行记录就有多少个sum值,
而group by只能计算每一组的sum,每组只有一个值!
其中sum()计算的是分区内排序后一个个叠加的值,和order by有关!
如果没有order by,不仅分区内没有排序,sum()计算的pv也是整个分区的pv
注:max()函数无论有没有order by 都是计算整个分区的最大值
实验四:
SELECT
cookieid,
creattime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv) AS pv2, --从起点到当前行,结果同pv1
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv ) AS pv3, --当前行+往前3行
row_number() OVER(PARTITION BY cookieid ORDER BY pv) AS rn
FROM
dim.test_stu_info_study;
实验四结果:
pv pv1 pv2 rn
cookie3 2017-12-22 5 1 1 1
cookie1 2017-12-10 1 1 1 1
cookie1 2017-12-14 2 2 2 2
cookie1 2017-12-13 3 3 3 3
cookie1 2017-12-15 4 4 4 4
cookie1 2017-12-16 4 4 4 5
cookie1 2017-12-11 5 6 5 6
cookie1 2017-12-12 7 7 6 7
b 2017-12-00 3 1 1 1
cookie2 2017-12-24 1 1 1 1
cookie2 2017-12-16 6 2 2 2
cookie2 2017-12-12 7 3 3 3
a 2017-12-01 3 1 1 1
开窗函数和聚合函数区别
select ename,sal,sum(sal) over (partition by ename order by sal,empno) as running_total
from emp1
order by 2
按ename 汇总sal
over()开窗函数和聚合函数的不同之处是对于每个组返回多行,而聚合函数对于每个组只返回一行。
SQL> select e.empno,e.ename,e.job,e.sal,e.deptno, sum(e.sal) over (partition by e.deptno) as total_sal
2 from emp e;
EMPNO ENAME JOB SAL DEPTNO TOTAL_SAL
---------- ---------- --------- ---------- ---------- ----------
7782 CLARK MANAGER 2450 10 8750
7839 KING PRESIDENT 5000 10 8750
7934 MILLER CLERK 1300 10 8750
7566 JONES MANAGER 2975 20 10875
7902 FORD ANALYST 3000 20 10875
7876 ADAMS CLERK 1100 20 10875
7369 SMITH CLERK 800 20 10875
7788 SCOTT ANALYST 3000 20 10875
7521 WARD SALESMAN 1250 30 9400
7844 TURNER SALESMAN 1500 30 9400
7499 ALLEN SALESMAN 1600 30 9400
EMPNO ENAME JOB SAL DEPTNO TOTAL_SAL
---------- ---------- --------- ---------- ---------- ----------
7900 JAMES CLERK 950 30 9400
7698 BLAKE MANAGER 2850 30 9400
7654 MARTIN SALESMAN 1250 30 9400
已选择14行。
聚合函数:
SQL> select sum(sal) ,deptno from emp group by deptno;
SUM(SAL) DEPTNO
---------- ----------
9400 30
10875 20
8750 10