有 时候我们需要找出在一组连续的号码中,有哪些是缺失的部分。具体的需求又分为两种情况
- 查找 连续部分的最大和最小值
- 查找缺少的部分
经过构造的数据如下
-- preparation
drop table test1 purge;
create table test1(id int,dt date);
insert into test1
(id, dt)
select rownum as id, trunc(add_months(sysdate, rownum), 'mm') as dt
from dual
connect by rownum <= 20;
delete test1 where id in (4,7,13,19);
commit;
SQL> select * from test1;
ID DT
---- -----------
1 2009-4-1
2 2009-5-1
3 2009-6-1
5 2009-8-1
6 2009-9-1
8 2009-11-1
9 2009-12-1
10 2010-1-1
11 2010-2-1
12 2010-3-1
14 2010-5-1
15 2010-6-1
16 2010-7-1
17 2010-8-1
18 2010-9-1
20 2010-11-1
16 rows selected
查找 连续部分的最大和最小值
下面的场景是连续部分的起始和终点,使用NUMBER类型
-- case 1 number type + rownum
select min(v2.id) as n1, max(v2.id) as n2
from (select v1.id, (v1.id - rownum) as delta
from (select id from test1 order by id) v1) v2
group by v2.delta
order by 1;
首先需要对取出的id值进行排序
用这个id值和rownum相减,如果差值是相同的,说明这些id是连续的
N1 N2
---------- ----------
1 3
5 6
8 12
14 18
20 20
然后使用分析函数实现 同样的功能。
-- case 2 number type + analyse func
with v as
(select id,
lag(id) over(order by id) as lag,
lead(id) over(order by id) as lead
from test1
order by id)
select v1.id, v2.id
from (select v.id, rownum as rn
from v
where (v.id - v.lag) != 1
or v.lag is null) v1,
(select v.id, rownum as rn
from v
where (v.lead - v.id) != 1
or v.lead is null) v2
where v1.rn = v2.rn;
这里使用WITH构造一个视图,避免了对表的多次扫描
如果一个ID值和它的上一个值之间的绝对值差不是1,表示发生了跳跃。
这个ID是某个连续区间的最小或者最大
相同的需求,不同的是这里使用日期类型。注意,因为构造的数据是以MONTH为单位的,所以这里需要使用months_between()来计算差值。
-- case 3 date type + rownum
select min(v2.dt) as n1, max(v2.dt) as n2
from (select v1.dt,
months_between(v1.dt, trunc(add_months(sysdate, rownum), 'mm')) as delta
from (select dt from test1 order by id) v1) v2
group by v2.delta
order by 1;
N1 N2
----------- -----------
2009-4-1 2009-6-1
2009-8-1 2009-9-1
2009-11-1 2010-3-1
2010-5-1 2010-9-1
2010-11-1 2010-11-1
查找缺少的部分
查找缺少的部分,基本思路是在找到连续部分的基础上,构造缺少的值
查找缺少的部分,基本思路是在找到连续部分的基础上,构造缺少的值
-- case 4 find lost value
with v as
(select min(v2.id) as n1,
max(v2.id) as n2,
lead(min(v2.id)) over(order by min(v2.id)) as n3
from (select v1.id, (v1.id - rownum) as delta
from (select id from test1 order by id) v1) v2
group by v2.delta
order by 1)
select t1.sid + t2.rn as lost
from (select v.n2 as sid, v.n3 as eid from v) t1,
(select rownum rn from dual connect by rownum <= 50) t2
where t1.sid + t2.rn < t1.eid;
使用case1的算法得到连续部分的最大n2,最小值n1
使用lead函数把某个最大值n2和下一个最小值n3 放到同一行
使用rownum构造连续的缺少部分
LOST
----------
4
7
13
19
另一种查找缺少的方 法,使用外连接
-- case 5 find lost value
with v as
(select min(id) as n1, max(id) as n2 from test1)
select t2.rn as lost
from v, test1 t1, (select rownum rn from dual connect by rownum <= 50) t2
where t2.rn between v.n1 and v.n2
and t1.id(+) = t2.rn
and t1.id is null
order by t2.rn;
取得表的最大n2和最小n1,这里是全部数据的集合
构造大表,并且只取得最大最小之间的rownum
使用外连接取得缺少的值
注意:和case4相比,这个方法要扫描两次表
Note: 更复杂的情况,可以参考Partition Outter Join