zoukankan      html  css  js  c++  java
  • 在SSIS 2012中使用CDC(数据变更捕获)

    最新项目稍有空隙,開始研究SQL Server 2012和2014的一些BI特性,參照()的一个演示样例,我们開始体验SSIS中的CDC(Change Data Capture,变更数据捕获)。

    注:假设须要了解关于SQL Server 2008中的CDC,请看这里http://blog.csdn.net/downmoon/article/details/7443627),本文假定读者对CDC的工作方式已有所了解。^_^。

    我们分三步完毕实例:

    1、准备基础数据;

    2、设计一个初始包;

    3、在2的基础上设计一个增量包。

    首先请完毕以下准备安装:

    (1)Visual studio 2012或Visual Studio 2012 Shell (Isolated) Redistributable Package

    http://www.microsoft.com/en-us/download/details.aspx?id=30678

    http://www.microsoft.com/en-us/download/details.aspx?id=30670

    (2)SQL Server Data Tools - Business Intelligence for Visual Studio 2012

    http://www.microsoft.com/zh-cn/download/details.aspx?id=36843

    (2)SQL Server 2012企业版或开发版

    http://www.microsoft.com/en-us/download/details.aspx?id=29066

    (3)演示样例数据库AdventureWorksDW2012(本文必须,假设自建表则不必)

    http://msftdbprodsamples.codeplex.com/releases/view/55330

    好了,開始第一步:

    /*
    -- =============================================
    -- 创建測试数据库及数据表,借助AdventureWorksDW2012演示样例数据库
    ---Generate By downmoon(邀月),3w@live.cn
    -- =============================================
    */
    --Create database CDCTest
    --GO
    --USE [CDCTest]
    --GO
    
    --SELECT * INTO DimCustomer_CDC
    --FROM [AdventureWorksDW2012].[dbo].[DimCustomer]
    --WHERE CustomerKey < 11500;
    
    --select * from DimCustomer_CDC;

    /*
    -- =============================================
    -- 启用数据库级别CDC,仅仅对企业版和开发版有效
    ---Generate By downmoon(邀月),3w@live.cn
    -- =============================================
    */
    USE
     [CDCTest]
    GO
    
    EXEC sys.sp_cdc_enable_db
    GO
    
    -- add a primary key to the DimCustomer_CDC table so we can enable support for net changes
    IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = 
    OBJECT_ID(N'[dbo].[DimCustomer_CDC]') AND name = N'PK_DimCustomer_CDC')
      ALTER TABLE [dbo].[DimCustomer_CDC] ADD CONSTRAINT 
    [PK_DimCustomer_CDC] PRIMARY KEY CLUSTERED 
    (
        [CustomerKey] ASC
    )
    GO
    
    /*
    -- =============================================
    -- 启用表级别CDC
    ---Generate By downmoon(邀月),3w@live.cn
    -- =============================================
    */
    EXEC sys.sp_cdc_enable_table 
    @source_schema = N'dbo',
    @source_name = N'DimCustomer_CDC',
    @role_name = N'cdc_admin',
    @supports_net_changes = 1
    
    GO

    /*
    -- =============================================
    -- 创建一个目标表,与源表(Source)有同样的表结构
    --注意,在生产环境中,全然能够是不同的实例或服务器,本例为了方便,在同一个数据库实例的同一个数据库中演示
    ---Generate By downmoon(邀月),3w@live.cn
    -- =============================================
    */
    SELECT TOP 0 * INTO DimCustomer_Destination
    FROM DimCustomer_CDC
    --select @@version;
    select * from DimCustomer_Destination;

    邀月工作室邀月工作室

    第二步:创建初始包

    -- =============================================
    -- 我们使用两个包来完毕演示样例,一个初始包完毕数据的初始载入,一个增量包完毕数据的变更捕获
    ---Generate By downmoon(邀月),3w@live.cn
    -- =============================================
    
    

    初始包包括例如以下逻辑:
    (1)使用CDC Control Task标记初始载入開始LSN(Use the CDC Control Task to mark the initial load start LSN)
    (2)转换全部源表数据到目标表(Transfer all of the data from the source table into our destination table)
    (3)使用CDC Control Task标记初始载入结束LSN(Use the CDC Control Task to mark the initial load end LSN)

    演示样例:http://code.msdn.microsoft.com/My-First-Integration-fa41c0b1

    新建一个SSIS项目,创建一个包“Initial Load”,例如以下图:

    邀月工作室

    新建两个CDC  Control Task,分别命名为“CDC Control Task Start”和“CDC Control Task End”,分别相应属性为“Mark initial load start”和""Mark initial load end"

    连接管理器均为ADO.NET方式,其它属性例如以下图:

    邀月工作室

    邀月工作室

    中间增加一个“Data Flow Task”,属性默认。

    邀月工作室

    此时,执行包,可见CDC_States有初始标记。

    邀月工作室


    第三步:创建增量包

    增量包包括例如以下逻辑:
    (1)创建一个源数据库的连接管理器(Create a connection manager for the Source database)
    (2)设置CDC运算符以获取处理边界(Set the CDC Control Operation to Get processing range)
    (3)创建一个新的CDC状态变量(CDC_state)(Create a new CDC state variable (CDC_state))
    (4)创建一个目标数据库的连接管理器(Create a connection manager for the Destination database)
    (5)选择前面初始载入包创建的状态表(Select the state table (this was created by the Initial Load package) – [dbo].[cdc_states])
    (6)设置状态名称(必须匹配初始载入包使用过的状态名称,this must match what was used in the Initial Load package (CDC_State))

    在项目中创建一个新包,命名为“Incremental Load”

    在包的"Control Flow"视图中,自上而下分别手动6个Task,顺序例如以下图,除去上面用到的三个Task,其余均为Execute SQL Task

    邀月工作室

    注意:CDC Control Task End的CDC运算符为MARK Process Range,CDC Control Task Start的CDC运算符为Get Process Range

    其余4个Execute SQL Task的SQL语句例如以下:

    --Create stage Tables
    IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[stg_DimCustomer_UPDATES]') AND type in (N'U'))
    BEGIN
       SELECT TOP 0 * INTO stg_DimCustomer_UPDATES
       FROM DimCustomer_Destination
    END
    
    IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[stg_DimCustomer_DELETES]') AND type in (N'U'))
    BEGIN
       SELECT TOP 0 * INTO stg_DimCustomer_DELETES
       FROM DimCustomer_Destination
    END

    -- batch update
    UPDATE dest
    SET 
        dest.FirstName = stg.FirstName, 
        dest.MiddleName = stg.MiddleName,
        dest.LastName = stg.LastName, 
        dest.YearlyIncome = stg.YearlyIncome
    FROM 
        [DimCustomer_Destination] dest, 
        [stg_DimCustomer_UPDATES] stg
    WHERE 
        stg.[CustomerKey] = dest.[CustomerKey]

    -- batch delete
    DELETE FROM [DimCustomer_Destination]
      WHERE[CustomerKey] IN 
    (
        SELECT [CustomerKey]
        FROM [dbo].[stg_DimCustomer_DELETES]
    )

    -- truncate table 
    truncate table  [dbo].[stg_DimCustomer_DELETES]
    truncate table  [dbo].[stg_DimCustomer_UPDATES]
    
    

    最关键的一步,选中CDC Control Task Start,并切换到Data Flow,自上而下分别拖动CDC Source,CDC Splitter Transformer,三个ADO.NET Destination,例如以下图:

    邀月工作室

    当中三个的目标表分别为:[DimCustomer_Destination],stg_DimCustomer_DELETES,stg_DimCustomer_UPDATES。

    邀月工作室

    邀月工作室

    而CDC Source的连接管理器属性例如以下图:

    邀月工作室

    此时,可执行增量包,但我们不会看到不论什么执行结果,由于此时我们还没有进行数据的Insert或Update操作。

    下来我们提供一个脚本,測试下效果:

    -- =============================================
    -- 更新一些数据,以显示SSIS 2012中CDC的效果
    ---Generate By downmoon(邀月),3w@live.cn
    -- =============================================
    
    USE [CDCTest]
    GO
     
    -- Transfer the remaining customer rows
    SET IDENTITY_INSERT DimCustomer_CDC ON
     
    INSERT INTO DimCustomer_CDC
    (
           CustomerKey, GeographyKey, CustomerAlternateKey, Title, FirstName, 
           MiddleName, LastName, NameStyle, BirthDate, MaritalStatus, 
           Suffix, Gender, EmailAddress, YearlyIncome, TotalChildren, 
           NumberChildrenAtHome, EnglishEducation, SpanishEducation,
           FrenchEducation, EnglishOccupation, SpanishOccupation, 
           FrenchOccupation, HouseOwnerFlag, NumberCarsOwned, AddressLine1, 
           AddressLine2, Phone, DateFirstPurchase, CommuteDistance
    )
    SELECT CustomerKey, GeographyKey, CustomerAlternateKey, Title, FirstName, 
           MiddleName, LastName, NameStyle, BirthDate, MaritalStatus, 
           Suffix, Gender, EmailAddress, YearlyIncome, TotalChildren, 
           NumberChildrenAtHome, EnglishEducation, SpanishEducation,
           FrenchEducation, EnglishOccupation, SpanishOccupation, 
           FrenchOccupation, HouseOwnerFlag, NumberCarsOwned, AddressLine1, 
           AddressLine2, Phone, DateFirstPurchase, CommuteDistance
    FROM [AdventureWorksDW2012].[dbo].[DimCustomer]
    WHERE CustomerKey =11502
     
    SET IDENTITY_INSERT DimCustomer_CDC OFF
    GO
     
    -- give 10 people a raise
    UPDATE DimCustomer_CDC 
    SET 
        YearlyIncome = YearlyIncome + 10
    WHERE
        CustomerKey >= 11000 AND CustomerKey <= 11010
     
    GO

    此时,我们能够看到变更捕获的结果:

    邀月工作室

    假设您认为还不够直观,请"Enable Data Viewer",

    邀月工作室

    邀月工作室

    至此,一个SSIS 2012中CDC的实例演示结束,假设还有进一步的研究,请移驾MSDN,以下有链接。本文也提供演示样例项目包,以作研究之用。

    项目文件下载1项目文件下载2

    本文參考:

    http://msdn.microsoft.com/en-us/library/bb895315.aspx

    http://www.mattmasson.com/index.php/2011/12/cdc-in-ssis-for-sql-server-2012-2/?utm_source=rss&utm_medium=rss&utm_campaign=cdc-in-ssis-for-sql-server-2012-2


    邀月注:本文版权由邀月和CSDN共同全部,转载请注明出处。
    助人等于自助!   3w@live.cn



  • 相关阅读:
    03-串联
    大数据项目之电商数仓(3电商数据仓库系统)V6.1.3
    JQuery实现tab页
    Java面试题之计算字符/字符串出现的次数
    ios 苹果内购订单验证 --- nodejs实现
    ios 苹果内购订单验证 --- php实现
    Android内购订单验证 --- nodejs实现
    Android内购订单验证 --- php实现
    Google Compute Engine VM自动调节
    php性能优化 --- laravel 性能优化
  • 原文地址:https://www.cnblogs.com/zfyouxi/p/3790719.html
Copyright © 2011-2022 走看看