其实这个问题就是数据中的“island”孤岛数据的问题,只是要求连续长度是3。首先避开这个问题不谈,看看下来的数字:
1 2 3 5 7 8 9 11 13 18 22 27 28 29 30 40 100 102 107 108 109
现在有如下需求:找出连续的数字开始和结束的位置。
很显然,比划比划这个结果就是:
1 3
7 9
27 30
107 109
那么现在寻找一种方式,把这个结果用sql查出来。首先造数据:
;WITH Num(N) AS( SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 5 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 11 UNION ALL SELECT 13 UNION ALL SELECT 18 UNION ALL SELECT 22 UNION ALL SELECT 27 UNION ALL SELECT 28 UNION ALL SELECT 29 UNION ALL SELECT 30 UNION ALL SELECT 40 UNION ALL SELECT 100 UNION ALL SELECT 102 UNION ALL SELECT 107 UNION ALL SELECT 108 UNION ALL SELECT 109)
先把这些数字当中连续的边界找出来(1,3,5,7,9,11,13,18,22,27,30,40,100,102,107,109)。查找的条件如下:不存在比当前小的数或者比当前大的数。如下:
cte1(N) AS( SELECT N FROM Num n1 WHERE NOT EXISTS( SELECT 'x' FROM Num n2 WHERE n2.N = n1.N - 1) OR NOT EXISTS( SELECT 'x' FROM Num n3 WHERE n3.N = n1.N + 1))
既然找到了边界,接下来我对两个边界进行配对,只不过这对值的第二值是比第一边界值大但是最小的那个。
SELECT N, ( SELECT MIN(N) FROM cte1 c2 WHERE c2.N > c1.N ) Big FROM cte1 c1
配对成功,接下了我只需要找存在配对当中的行就可以了。如下:
SELECT *, Big - N + 1 AS count FROM( SELECT N, ( SELECT MIN(N) FROM cte1 c2 WHERE c2.N > c1.N ) Big FROM cte1 c1) AS dr1 WHERE EXISTS( SELECT 'x' FROM Num c1 WHERE c1.N > dr1.N AND c1.N < dr1.Big )
这样就得到想要的结果了。现在回过来看看“连续3年获奖的人”,用一样的原理可以得到第一种方式(最慢的方式)。
首先造点需要的数据:
;WITH Prize(Name, Year) AS( SELECT 'Kim', '2001' UNION ALL SELECT 'Tim', '2002' UNION ALL SELECT 'Tang', '2002' UNION ALL SELECT 'Jack', '2001' UNION ALL SELECT 'Juicy', '2001' UNION ALL SELECT 'Peff', '2002' UNION All SELECT 'Juicy', '2002' UNION ALL SELECT 'Kim', '2002' UNION ALL SELECT 'Juicy', '2003' UNION ALL SELECT 'Peff', '2003' UNION ALL SELECT 'Kim', '2003' UNION ALL SELECT 'Tim', '2004' UNION ALL SELECT 'Juicy', '2004' UNION ALL SELECT 'Jack', '2005' UNION ALL SELECT 'Eric', '2005' UNION ALL SELECT 'Tim', '2005' UNION ALL SELECT 'Eric', '2006' UNION ALL SELECT 'Peff', '2006' UNION ALL SELECT 'Juicy', '2007' UNION ALL SELECT 'Eric', '2007' UNION ALL SELECT 'Tang', '2007' UNION ALL SELECT 'Tang', '2008' UNION ALL SELECT 'Peff', '2008' UNION ALL SELECT 'Kim', '2008' UNION ALL SELECT 'Jack', '2009' UNION ALL SELECT 'Tang', '2009' UNION ALL SELECT 'Kim', '2009')
第一种方式为:
,cte1 AS( SELECT * FROM Prize p1 WHERE NOT EXISTS( SELECT 'x' FROM Prize p2 WHERE p1.Name = p2.Name AND p2.Year = p1.Year - 1) OR NOT EXISTS( SELECT 'x' FROM Prize p3 WHERE p3.Name = p1.Name AND p3.Year = p1.Year + 1) ) SELECT * FROM ( SELECT Name, Year, ( SELECT MIN(Year) FROM cte1 c2 WHERE c2.Name = c1.Name AND c2.Year > c1.Year ) AS Bigger FROM cte1 c1 --ORDER BY Name, Year ) AS Dr1 WHERE EXISTS( SELECT 'x' FROM Prize p WHERE p.Name = Dr1.Name AND p.Year > Dr1.Year AND p.Year < Dr1.Bigger ) AND CAST(Bigger AS INT) - CAST(Year AS INT) + 1 >= 3 ORDER BY Name, Year
第二种方式,凡是连续三年获奖的人,那么year上一定存在三个连续的值,那么我就用两个exists条件去查询,如下:
SELECT DISTINCT Name FROM Prize p1 WHERE EXISTS( SELECT 'x' FROM Prize p2 WHERE p2.Name = p1.Name AND p2.Year = p1.Year + 1) AND EXISTS( SELECT 'x' FROM Prize p3 WHERE p3.Name = p1.Name AND p3.Year = p1.Year + 2)第三种方式,这种方式从《Microsoft SQL Server 2005技术内幕:T-SQL查询》中需来,博客园有篇文章对此总结,不再赘述。
SELECT DISTINCT Name FROM ( SELECT *, (SELECT MIN(p2.Year) FROM Prize p2 WHERE p2.Name = p.Name and p2.Year >= p.Year AND NOT EXISTS( SElECT 'x' FROM Prize p3 WHERE p3.Name = p2.Name AND p3.Year = p2.Year + 1)) Big FROM Prize p ) AS Dr WHERE CAST(Dr.Big AS INT) - CAST(Dr.Year AS INT) + 1 >= 3
第四种方式,如果我按照‘年’对表进行旋转,会得到什么结果?首先试试:
SELECT * FROM Prize P PIVOT( COUNT(Year) FOR Year IN( [2001], [2002], [2003], [2004], [2005], [2006], [2007], [2008], [2009]) ) AS PD
那么我可以得到结果:
既然有了个结果,接下来就好处理了,我把三个连续的年列组成一组,凡是都为1的一定是3年都获奖的人。如下:
SELECT Name FROM ( SELECT * FROM Prize P PIVOT( COUNT(Year) FOR Year IN( [2001], [2002], [2003], [2004], [2005], [2006], [2007], [2008], [2009]) ) AS PD ) AS D WHERE (D.[2001] = 1 AND D.[2002] = 1 AND D.[2003] = 1) OR (D.[2002] = 1 AND D.[2003] = 1 AND D.[2004] = 1) OR (D.[2003] = 1 AND D.[2004] = 1 AND D.[2005] = 1) OR (D.[2004] = 1 AND D.[2005] = 1 AND D.[2006] = 1) OR (D.[2005] = 1 AND D.[2006] = 1 AND D.[2007] = 1) OR (D.[2006] = 1 AND D.[2007] = 1 AND D.[2008] = 1) OR (D.[2007] = 1 AND D.[2008] = 1 AND D.[2009] = 1)
这种方式是非常快的,比以上任何方式都要快,但是有它的局限性,以上的[2001]……[2009]全部都是hard的。如过表中的年很多,需要在IN后面加上年份,然后在WHERE条件
中进行连续3年的组合,如果表中有n个年份,那么where条件中一种有n-2个条件。有时为了效率,多写点代码是值得的。