在拙文 https://www.cnblogs.com/xiandedanteng/p/12327809.html 中,我提到过求学生单科最新成绩的四种sql语句,它们是下面四种:
--Rank-- select * from (select tb_scoreflow.*,rank() over(partition by stuid,sbjid order by cdate desc) as seq from tb_scoreflow) a where a.seq=1 --Left join-- SELECT a.* from tb_scoreflow a left JOIN tb_scoreflow b on a.stuid = b.stuid and a.sbjid = b.sbjid and b.cdate > a.cdate where b.cdate IS NULL order by a.stuid,a.sbjid --group by-- select a.* from tb_scoreflow a , (select stuid,sbjid,max(cdate) as cdate from tb_scoreflow group by stuid,sbjid) b where a.stuid=b.stuid and a.sbjid=b.sbjid and a.cdate=b.cdate order by a.stuid,a.sbjid --Exist anti join-- select a.* from tb_scoreflow a where not exists ( select null from tb_scoreflow b where b.stuid=a.stuid and b.sbjid=a.sbjid and b.cdate>a.cdate) order by a.stuid,a.sbjid
如果让Oracle解释计划来评价这四种SQL,哪一种会得到较高的评价呢,让我们看看:
首先是采用分析函数Rank方案的:
SQL> select * from (select tb_scoreflow.*,rank() over(partition by stuid,sbjid order by cdate desc) as seq from tb_scoreflow) a where a.seq=1; 已用时间: 00: 00: 00.04 执行计划 ---------------------------------------------------------- Plan hash value: 2391432598 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50 | 3700 | 4 (25)| 00:00:01 | |* 1 | VIEW | | 50 | 3700 | 4 (25)| 00:00:01 | |* 2 | WINDOW SORT PUSHED RANK| | 50 | 1000 | 4 (25)| 00:00:01 | | 3 | TABLE ACCESS FULL | TB_SCOREFLOW | 50 | 1000 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("A"."SEQ"=1) 2 - filter(RANK() OVER ( PARTITION BY "STUID","SBJID" ORDER BY INTERNAL_FUNCTION("CDATE") DESC )<=1)
左联结的:
SQL> SELECT a.* from tb_scoreflow a left JOIN tb_scoreflow b on a.stuid = b.stuid and a.sbjid = b.sbjid and b.cdate > a.cdate where b.cdate IS NULL order by a.stuid,a.sbjid; 已用时间: 00: 00: 00.01 执行计划 ---------------------------------------------------------- Plan hash value: 2451855310 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 34 | 8 (25)| 00:00:01 | | 1 | SORT ORDER BY | | 1 | 34 | 8 (25)| 00:00:01 | |* 2 | FILTER | | | | | | |* 3 | HASH JOIN OUTER | | 1 | 34 | 7 (15)| 00:00:01 | | 4 | TABLE ACCESS FULL| TB_SCOREFLOW | 50 | 1000 | 3 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL| TB_SCOREFLOW | 50 | 700 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("B"."CDATE" IS NULL) 3 - access("A"."SBJID"="B"."SBJID"(+) AND "A"."STUID"="B"."STUID"(+)) filter("B"."CDATE"(+)>"A"."CDATE")
group by的:
SQL> select a.* from tb_scoreflow a , (select stuid,sbjid,max(cdate) as cdate from tb_scoreflow group by stuid,sbjid) b where a.stuid=b.stuid and a.sbjid=b.sbjid and a.cdate=b.cdate order by a.stuid,a.sbjid; 已用时间: 00: 00: 00.00 执行计划 ---------------------------------------------------------- Plan hash value: 3190377201 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 34 | 8 (25)| 00:00:01 | |* 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 1 | 34 | 8 (25)| 00:00:01 | |* 3 | HASH JOIN | | 50 | 1700 | 7 (15)| 00:00:01 | | 4 | TABLE ACCESS FULL| TB_SCOREFLOW | 50 | 1000 | 3 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL| TB_SCOREFLOW | 50 | 700 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("A"."CDATE"=MAX("CDATE")) 3 - access("A"."STUID"="STUID" AND "A"."SBJID"="SBJID")
反联结的:
SQL> select a.* from tb_scoreflow a where not exists ( select null from tb_scoreflow b where b.stuid=a.stuid and b.sbjid=a.sbjid and b.cdate>a.cdate) order by a.stuid,a.sbjid; 已用时间: 00: 00: 00.00 执行计划 ---------------------------------------------------------- Plan hash value: 3319093165 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 34 | 8 (25)| 00:00:01 | | 1 | SORT ORDER BY | | 1 | 34 | 8 (25)| 00:00:01 | |* 2 | HASH JOIN ANTI | | 1 | 34 | 7 (15)| 00:00:01 | | 3 | TABLE ACCESS FULL| TB_SCOREFLOW | 50 | 1000 | 3 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL| TB_SCOREFLOW | 50 | 700 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("B"."STUID"="A"."STUID" AND "B"."SBJID"="A"."SBJID") filter("B"."CDATE">"A"."CDATE")
从最重要的Cost指标来看,Rank方案以4胜出,其它三种方案以8不分伯仲。
但是,这是在目前50条记录,且Oracle假定数据均衡下做出的判断,在真实数据下跑产生的速度差异可能和上面的结论大相径庭。
下面是tb_scoreflow表的信息:
SQL> desc tb_scoreflow; 名称 是否为空? 类型 ----------------------------------------------------- -------- ------------------------------------ ID NOT NULL NUMBER(6) STUID NOT NULL NUMBER(6) SBJID NOT NULL NUMBER(6) SCORE NOT NULL NUMBER(3) CDATE DATE SQL> select count(*) from tb_scoreflow; COUNT(*) ---------- 50 已用时间: 00: 00: 00.04
表格和数据的产生请参考:https://www.cnblogs.com/xiandedanteng/p/12327809.html
--2020年3月15日--