zoukankan      html  css  js  c++  java
  • mysql

    初始化数据

    # 创建表
    DROP TABLE IF EXISTS g;
    CREATE TABLE g(
        a INT
    )ENGINE=INNODB;
    
    # 初始化数据
    INSERT INTO g SELECT 1;
    INSERT INTO g SELECT 2;
    INSERT INTO g SELECT 3;
    INSERT INTO g SELECT 100;
    INSERT INTO g SELECT 101;
    INSERT INTO g SELECT 103;
    INSERT INTO g SELECT 104;
    INSERT INTO g SELECT 105;
    INSERT INTO g SELECT 106;

    对于g表其缺失范围如4-16所示

    对于g表其连续范围如4-17所示

    对于缺失范围的问题,可以通过下列步骤来解决

    1)找到间断点之前的值,然后对该值加1,即为start_range;

    2)找到间断点之前的值,然后对该值减1,即为end_range;

    对于间断点之前的值,可以用如下sql:

    SELECT a
    FROM g AS A
    WHERE NOT EXISTS(
        SELECT *
        FROM g AS B WHERE A.a+1=B.a
    )

    查出的106是无用的,因为它是表中的最大值,所以将其过滤掉。断点之前的值,对该值加1操作,即为start_range,可以通过以下sql语句得到:

    SELECT a+1 start_range
    FROM g AS A
    WHERE NOT EXISTS(
        SELECT *
        FROM g AS B WHERE A.a+1=B.a
    ) AND a < (SELECT MAX(a) FROM g)

     

    最后通过子查询为每个最小间断点返回表g中下一个已有的值并减一,即得到间断点end_range,最终sql语句如下所示:

    SELECT a+1 start_range, (
        SELECT MIN(a)-1
        FROM g C WHERE C.a > A.a
    ) AS end_range
    FROM g AS A
    WHERE NOT EXISTS(
        SELECT *
        FROM g AS B WHERE A.a+1=B.a
    ) AND a < (SELECT MAX(a) FROM g)

     这只是该问题的解决方案之一,更为简单直观的方法是,将表g中的数据进行移位匹配,如果是连续的值,那么其差值应该为1,如果不是连续的值就应该大于1

    对于表g,进行移位匹配后应该得到如表4-18所示的内容。

    可以看出next - cur的值等于1表示连续的值,否则不连续,不连续的值为(3,100)、(101,103),而我们要求的不连续范围为(4,99)、(102,102),也就是(cur+1,next-1)就是我们要的确实范围,要得到图4-18所示的内容,可以执行下述SQL:

    SELECT A.a AS cur,(
        SELECT MIN(a)
        FROM g AS B WHERE B.a > A.a
    ) AS next
    FROM g AS A

    而要得到最终的结果,只需要对cur加1,next减1即可。该好处另外一个好处是无需处理最大值,因为最大数next的值为NULL。该解决方案的SQL语句如下所示:

    SELECT cur+1 AS start_range, NEXT-1 AS end_range
    FROM (
        SELECT A.a AS cur,(
            SELECT MIN(a)
            FROM g AS B WHERE B.a > A.a
        ) AS NEXT
        FROM g AS A
    ) AS C
    WHERE NEXT-cur > 1;

    连续范围,如果采用子查询方案,我们要手动创建一个列,并对这个列进行分组。这个列应该是每个连续分组的最大值,对于{1,2,3}来说,这个最大值就应该是3。计算一组连续组中最大值所依据的原理是:返回大于或等于当前值且后面一个值为间断点的最小值。下面是该子查询的SQL:

    SELECT a,(
        SELECT MIN(a)
        FROM g AS A
        WHERE NOT EXISTS(
            SELECT *
            FROM g AS B
            WHERE A.a+1 = B.a
        ) AND A.a >= C.a
    ) AS MAX
    FROM g AS C

    剩下的工作就就简单了,在上一步查询中执行如下语句对max列进行分组,得到分组中的最小值和最大值,这就是我们要的连续范围。

    SELECT MIN(a) AS start_range, MAX(a) AS end_range
    FROM(
        SELECT a,(
            SELECT MIN(a)
            FROM g AS A
            WHERE NOT EXISTS(
                SELECT *
                FROM g AS B
                WHERE A.a+1 = B.a
            ) AND A.a >= C.a
        ) AS MAX
        FROM g AS C
    ) AS D
    GROUP BY MAX;

    上述查询给出了连续范围问题的解决方案,但其性能是值得商榷的。这里的扫描成本是O(N²)。对于表中数据量非常大的情况,其性能又会变得十分糟糕。因此解决连续范围问题的最优方案是采用行号方法。

    SELECT MIN(a) AS start_range, MAX(a) AS end_range
    FROM
    (
        SELECT a, rn, a-rn AS diff
        FROM
        (
            SELECT a, @a:=@a+1 rn FROM g,
            (SELECT @a:=0) AS aa
        ) AS b
    ) AS c
    GROUP BY diff;

  • 相关阅读:
    c++的一些个学习及总结
    简单回合制游戏流程
    c++11 符号修饰与函数签名、函数指针、匿名函数、仿函数、std::function与std::bind
    二叉搜索树、平衡二叉树、红黑树
    STL之空间配置器
    centos 7 安装mysql及常用操作
    生产服务器问题定位
    Linux常用命令
    Garbage First(G1) 垃圾收集器
    垃圾回收算法和垃圾收集器
  • 原文地址:https://www.cnblogs.com/frank-quan/p/5843149.html
Copyright © 2011-2022 走看看