zoukankan      html  css  js  c++  java
  • 一次性事务和CTE插入数据的比较

    有时要构造一些数据来做测试数据,像下面这样:

    IF OBJECT_ID(N'T14') IS NOT NULL
    BEGIN
        DROP TABLE T14
    END
    GO
    CREATE TABLE T14 (t14_id INT)
    GO
    
    DECLARE @i INT = 1
    WHILE @i <= 1000
    BEGIN
        INSERT INTO T14 (t14_id)
        SELECT @i
        SET @i = @i + 1
    END
    GO

    code-1

    这里存在一个问题,每运行一次insert相当于commit了一次事务,数据量小的还不会出现问题,如果把要插入100万,200万,1000万甚至更多的数据呢?既然insert语句是隐式commit的,在这个循环外面加一个显式的事务,即可显著提高插入的性能。另一种方法就是使用CTE也可以一次把数据插入到表中,从而提高性能。现在就这两种方法插入数据的性能来做一个比较。没有结果之前,猜猜哪种速度更快?或者两者差不多?

    首先是加事务,插入100万条记录:

    IF OBJECT_ID(N'T14') IS NOT NULL
    BEGIN
        DROP TABLE T14
    END
    GO
    CREATE TABLE T14 (t14_id INT)
    GO
    
    DBCC FREESESSIONCACHE
    DBCC DROPCLEANBUFFERS
    GO
    
    SET NOCOUNT ON;
    BEGIN TRAN
    DECLARE @i INT = 1
    WHILE @i <= 1000000
    BEGIN
        INSERT INTO T14 (t14_id)
        SELECT @i
        SET @i = @i + 1
    END
    COMMIT TRAN;
    SET NOCOUNT OFF;
    GO

    code-2

    我的机器上测试多次,取平均值,大概使用了22秒即可完成100万条记录的插入,速度还是挺快的。(如果没有加显式事务,要多久才能完成呢?有兴趣的朋友可以试下Be right back

    下面是使用CTE:

    IF OBJECT_ID(N'T15') IS NOT NULL
    BEGIN
        DROP TABLE T15
    END
    GO
    CREATE TABLE T15 (t15_id INT)
    GO
    
    DBCC FREESESSIONCACHE
    DBCC DROPCLEANBUFFERS
    GO
    
    WITH CTE1 AS ( 
    SELECT a.[object_id] FROM master.sys.all_objects AS a, master.sys.all_objects AS b
    )
    ,CTE2 AS (
    SELECT ROW_NUMBER() OVER (ORDER BY [object_id]) as row_no FROM CTE1
    )
    
    INSERT INTO T15 (t15_id)
    SELECT row_no  FROM CTE2 WHERE row_no <= 1000000
    GO

    code-3

    也是测试多次取平均值,竟然是5秒左右就完成,大大出乎我的意料!现在改为插入1000万条记录,看结果如何。前者只需把code-2中的1000000修改为10000000,再运行即可。后者由于CTE1的记录数不够,需要UNION ALL两次,代码如下:

    IF OBJECT_ID(N'T15') IS NOT NULL
    BEGIN
        DROP TABLE T15
    END
    GO
    CREATE TABLE T15 (t15_id INT)
    GO
    
    DBCC FREESESSIONCACHE
    DBCC DROPCLEANBUFFERS
    GO
    
    WITH CTE1 AS ( 
    SELECT a.[object_id] FROM master.sys.all_objects AS a, master.sys.all_objects AS b
    UNION ALL
    SELECT a.[object_id] FROM master.sys.all_objects AS a, master.sys.all_objects AS b
    UNION ALL
    SELECT a.[object_id] FROM master.sys.all_objects AS a, master.sys.all_objects AS b
    )
    ,CTE2 AS (
    SELECT ROW_NUMBER() OVER (ORDER BY [object_id]) as row_no FROM CTE1
    )
    
    INSERT INTO T15 (t15_id)
    SELECT row_no  FROM CTE2 WHERE row_no <= 10000000
    GO

    code-4

    测试结果:加事务的插入大概需要3分多钟,而CTE则不超过1分半钟的时间就完成了。看来还是CTE更高效啊!在测试过程中,发现内存的使用量不多,但CPU的使用有较明显的提高。此外,插入大数据到表中,有无索引和日志恢复模式也会影响插入的性能。

    -------补充-----
    这里补充一下CTE1中记录数的生成。如果只需要100万的数据量,只需要master.sys.databases表CROSS JOIN自己一次就可以了,或者找两张表CROSS JOIN后数据更接近的所需就更好了,不够的可以UNIONL ALL几次。那如果需要1000万或更大的记录数,可以在此基础上再CROSS JOIN一次一张小表,比如:

    ;WITH CTE3 AS ( 
    SELECT a.[object_id] FROM master.sys.all_objects AS a, master.sys.all_objects AS b, master.sys.databases AS c
    )
    
    SELECT COUNT(*) AS counts,LEN(COUNT(*)) AS counts_length FROM CTE3
    GO

    code-5

    figure-1

    我的机器上生成了1亿1多千万条记录。

  • 相关阅读:
    git(1)-git关联GitHub-windows-转载
    jenkins(4)-jenkins配置邮件通知
    jenkins(3)-linux下安装jenkins(yum install方式)
    【PAT甲级】1090 Highest Price in Supply Chain (25 分)(DFS)
    【PAT甲级】1087 All Roads Lead to Rome (30 分)(MAP【int,string】,邻接表,DFS,模拟,SPFA)
    【PAT甲级】1018 Public Bike Management (30 分)(DFS,SPFA)
    Educational Codeforces Round 61 (Rated for Div. 2) G(线段树,单调栈)
    Atcoder Grand Contest 032C(欧拉回路,DFS判环)
    Educational Codeforces Round 62 (Rated for Div. 2)E(染色DP,构造,思维,组合数学)
    Atcoder Grand Contest 031C(构造,思维,异或,DFS)
  • 原文地址:https://www.cnblogs.com/fishparadise/p/4781035.html
Copyright © 2011-2022 走看看