数据岛问题:
有表:
create table dataisland (id int)
insert into dataisland values(1),(2),(3),(7),(11),(12),(13)
找出数据连续的区域:
先对数据进行分组,组名为该组内最大的那个值。如1 2 3 的组名为3 。7的组名为7。
分组:
select d1.id,(select min(d2.id) from dataisland d2 where d2.id>=d1.id and not exists (select d3.id from dataisland d3 where d3.id=d2.id+1)) as grp from dataisland d1
select min(d.id),d.grp from (select d1.id,(select min(d2.id) from dataisland d2 where d2.id>=d1.id and not exists (select d3.id from dataisland d3 where d3.id=d2.id+1)) as grp from dataisland d1 ) as d group by d.grp
结果:
1 3
7 7
11 13
如果用窗口函数来解答:
select d1.id,ROW_NUMBER() over(order by d1.id) from dataisland d1
1 1
2 2
3 3
7 4
11 5
12 6
13 7
两列都在增长,不同的是第一列增长快,而第二列固定+1。所以如果用第一列减去第二列会得到在连续的地方差数相等。
select d1.id,d1.id-ROW_NUMBER() over(order by d1.id) as grp from dataisland d1
1 0
2 0
3 0
7 3
11 6
12 6
13 6
再找出组内的最大值和最小值即可。
select min(d.id),max(d.id) from ( select d1.id,d1.id-ROW_NUMBER() over(order by d1.id) as grp from dataisland d1) as d group by d.grp