zoukankan      html  css  js  c++  java
  • Oracle Golden Gate 系列十八 GG 多对一 realtime data warehousing 说明 与 示例

    一.官网说明

    A datawarehousing configuration is a many-to-one configuration. Multiple sourcedatabases send data to one target warehouse database. Oracle GoldenGatesupports like-to-like or heterogeneous transfer of data, with capabilities forfiltering and conversion on any system in the configuration (support varies bydatabase platform).

           --多个Source DB发送到一个Target warehouse DB。

          


    1.1 Considerations for a data warehousing configuration

    1.1.1 Isolation of data records

    Thisconfiguration assumes that each source database contributes different recordsto the target system. If the same record exists in the same table on two ormore source systems and can be changed on any of those systems, conflictresolution routines are needed to resolve conflicts when changes to that recordare made on both sources at the same time and replicated to the target table.

    这里假设每个source DB 发送给TargetDB的都是不同的record,如果是相同的record,可能就会在TargetDB 上产生冲突。

     

    1.1.2 Data storage

    You can dividethe data storage between the source systems and the target system to reduce theneed for massive amounts of disk space on the target system. This is accomplishedby using a data pump on each source, rather than sending data directly from eachExtract across the network to the target.

    (1)    A primary Extract writes to alocal trail on each source.

    (2)    A data-pump Extract on eachsource reads the local trail and sends it across TCP/IP to a dedicated Replicatgroup.

    1.1.3 Filtering and conversion

    If not all ofthe data from a source system will be sent to the data warehouse, you can use thedata pump to perform the filtering. This removes that processing overhead fromthe primary Extract group, and it reduces the amount of data that is sentacross the network.

    1.1.3.1 To filter data,you can use:

    (1)    A FILTER or WHERE clause in a TABLEstatement (Extract) or in a MAP statement (Replicat).

    (2)    A SQL query or procedure

    (3)    User exits

    1.1.3.2 To transformdata, you can use:

    (1)    Native Oracle GoldenGateconversion functions

    (2)    A user exit from the Extract orReplicat process that applies rules from an external

    (3)    transformation solution, thenreturns the manipulated data to Oracle GoldenGate.

    (4)    Replicat to deliver datadirectly to an ETL solution or other transformation engine.

     

    1.1.4 Data volume

    The standard configuration is sufficientif:

    (1)    The transaction load isconsistent and of moderate volume that is spread out more or less evenly amongall of the objects to be replicated.

    (2)    There are none of thefollowing: tables that are subject to long-running transactions, tables thathave a very large number of columns that change, or tables that contain columnsfor which Oracle GoldenGate must fetch from the database (generally columnswith LOBs, columns that are affected by SQL procedures executed by Oracle GoldenGate,and columns that are not logged to the transaction log).

    If yourenvironment does not satisfy those conditions, consider adding one or more setsof parallel processes.

    1.2 Creating a data warehousing configuration

    Figure 13 Configuration for datawarehousing

     

    1.2.1 Source systems

    --To configure theManager process

    1. On each source, configure the Managerprocess.

    2. In each Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail on the local system.

     

    --To configure theprimary Extract groups

    3. On each source, use the ADD EXTRACT commandto create a primary Extract group. For documentation purposes, these groups arecalled ext_1 and ext_2.

    Extract_1

    ADD EXTRACT<ext_1>, TRANLOG, BEGIN <time> [, THREADS <n>]

    Extract_2

    ADD EXTRACT<ext_2>, TRANLOG, BEGIN <time> [, THREADS <n>]

    注意: Use TRANLOG as the data source option. For DB2 on Z/OS, specify thebootstrap data set (BSDS) name following TRANLOG.

    4. On each source, use the ADD EXTTRAIL commandto create a local trail.

    Extract_1

    ADD EXTTRAIL<local_trail_1>, EXTRACT <ext_1>

    Extract_2

    ADD EXTTRAIL<local_trail_2>, EXTRACT <ext_2>

    注意: Use the EXTRACT argument to link each Extract group to the localtrail on the same system. The primary Extract writes to this trail, and thedata-pump reads it.

    5. On each source, use the EDIT PARAMS commandto create a parameter file for the primary Extract. Include the followingparameters plus any others that apply to your database environment.

    Extract_1

    -- Identify the Extract group:

    EXTRACT <ext_1>

    -- Specify database login information asneeded for the database:

    [SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]

    -- Specify the local trail that thisExtract writes to:

    EXTTRAIL <local_trail_1>

    -- Specify tables to be captured:

    TABLE <owner>.<table>;

    Extract_2

    -- Identify the Extract group:

    EXTRACT <ext_2>

    -- Specify database login information asneeded for the database:

    [SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]

    -- Specify the local trail that thisExtract writes to:

    EXTTRAIL <local_trail_2>

    -- Specify tables to be captured:

    TABLE <owner>.<table>;

     

    --To configure the datapumps

    6. On each source, use the ADD EXTRACT commandto create a data pump Extract group. For documentation purposes, these pumpsare called pump_1 and pump_2.

    Data pump_1

    ADD EXTRACT <pump_1>, EXTTRAILSOURCE<local_trail_1>, BEGIN <time>

    Data pump_2

    ADD EXTRACT <pump_2>, EXTTRAILSOURCE<local_trail_2>, BEGIN <time>

    注意:Use EXTTRAILSOURCE as the data source option, and specify the nameof the trail on the local system.

    7. On each source, use the ADD RMTTRAIL commandto create a remote trail on the target.

    Source_1

    ADD RMTTRAIL<remote_trail_1>, EXTRACT <pump_1>

    Source_2

    ADD RMTTRAIL<remote_trail_2>, EXTRACT <pump_2>

    注意:Use the EXTRACT argumentto link each remote trail to a different data pump. The data pump writes tothis trail over TCP/IP, and a Replicat reads from it.

    8. On each source, use the EDIT PARAMS commandto create a parameter file for the data pump group. Include the followingparameters plus any others that apply to your

    database environment.

    Data pump_1

    -- Identify the data pump group:

    EXTRACT <pump_1>

    -- Specify database login information asneeded for the database:

    [SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]

    -- Specify the name or IP address of thetarget system:

    RMTHOST <target>, MGRPORT<portnumber>

    -- Specify the remote trail on the target system:

    RMTTRAIL <remote_trail_1>

    -- Allow mapping, filtering, conversion orpass data through as-is:

    [PASSTHRU | NOPASSTHRU]

    -- Specify tables to be captured:

    TABLE <owner>.<table>;

    Data pump_2

    -- Identify the data pump group:

    EXTRACT <pump_2>

    -- Specify database login information asneeded for the database:

    [SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]

    -- Specify the name or IP address of thetarget system:

    RMTHOST <target>, MGRPORT<portnumber>

    -- Specify the remote trail on the targetsystem:

    RMTTRAIL <remote_trail_2>

    -- Allow mapping, filtering, conversion orpass data through as-is:

    [PASSTHRU | NOPASSTHRU]

    -- Specify tables to be captured:

    TABLE <owner>.<table>;

    注意:Use NOPASSTHRU if the data pump will be filtering or convertingdata, and also use the SOURCEDB and USERID parameters as appropriate for thedatabase, to enable definitions lookups. If the data pump will not be filteringor converting data, use PASSTHRU to bypass the lookups.

    NOTE:

    To use PASSTHRU mode,the names of the source and target objects must be identical. No column mapping, filtering, SQLEXECfunctions, transformation, or other functions that require data manipulationcan be specified in the parameter file. You can combine normal processing withpass-through processing by pairing PASSTHRU and NOPASSTHRU with different TABLEstatements.

    1.2.2 Target system

    --To configure theManager process

    9. Configure the Manager process.

    10. In the Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail.

     

    --To configure the Replicat groups

    11. On the target, use the ADD REPLICAT commandto create a Replicat group for each remote trail that you created. Fordocumentation purposes, these groups are called rep_1 and rep_2.

    Replicat_1

    ADD REPLICAT <rep_1>, EXTTRAIL<remote_trail_1>, BEGIN <time>

    Replicat_2

    ADD REPLICAT <rep_2>, EXTTRAIL<remote_trail_2>, BEGIN <time>

    注意: Use the EXTTRAIL argumentto link the Replicat group to the trail.

                  每个Data Pump 对应一个Replicat进程。   

    12. On the target, use the EDIT PARAMS commandto create a parameter file for each Replicat group. Include the followingparameters plus any others that apply to your database environment.

    Replicat_1

    -- Identify the Replicat group:

    REPLICAT <rep_1>

    -- State whether or not source and targetdefinitions are identical:

    SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS

    -- Specify database login information asneeded for the database:

    [TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]

    -- Specify error handling rules:

    REPERROR (<error>, <response>)

    -- Specify tables for delivery:

    MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];

    Replicat_2

    -- Identify the Replicat group:

    REPLICAT <rep_2>

    -- State whether or not source and targetdefinitions are identical:

    SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS

    -- Specify database login information asneeded for the database:

    [TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]

    -- Specify error handling rules:

    REPERROR (<error>, <response>)

    -- Specify tables for delivery:

    MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];

    注意:You can use anynumber of MAP statements for any given Replicat group. All MAP statements for agiven Replicat group must specify the same objects that are contained in thetrail that is linked to the group.

    二.示例

    这里还是3个节点的GG,我们这里从节点2和节点3发送到节点1.

    DB: oracle 11.2.0.3

    OS:  redhat 5.4

    2.1 在所有节点上设置Manager

    GGSCI (gg1) 2> edit params mgr

    PORT 7809

    2.2 分别在GG2和GG3上添加Extract进程

    --GG2

    GGSCI (gg2) 4> add extract ext1,tranlog,begin now

    GGSCI (gg2) 5> add exttrail/u01/ggate/dirdat/lt, extract ext1

    GGSCI (gg2) 7> view params ext1

    extract ext1

    userid ggate@gg2,password ggate

    exttrail /u01/ggate/dirdat/lt

    ddl include all objname dave.pdba;

    table dave.pdba;

    --GG3

    GGSCI (gg3) 2> add extract ext1,tranlog,begin now

    GGSCI (gg3) 4> add exttrail/u01/ggate/dirdat/lt, extract ext1

    GGSCI (gg3) 6> view params ext1

    extract ext1

    userid ggate@gg3,password ggate

    exttrail /u01/ggate/dirdat/lt

    ddl include all objname dave.pdba;

    table dave.pdba;

    2.3 分别在GG2和GG3上添加DataPump 进程

    --GG2

    GGSCI (gg2) 8> add extractdpump,exttrailsource /u01/ggate/dirdat/lt

    GGSCI (gg2) 9> add rmttrail/u01/ggate/dirdat/d1, extract dpump

    --注意这里指定的Target端的位置,我们以d1 开头。

    GGSCI (gg2) 11> view params dpump

    extract dpump

    userid ggate@gg2, password ggate

    rmthost gg1, mgrport 7809,compress,compressthreshold0

    rmttrail /u01/ggate/dirdat/d1

    passthru

    table dave.pdba;

    --GG3

    GGSCI (gg3) 7> add extractdpump,exttrailsource /u01/ggate/dirdat/lt

    GGSCI (gg3) 8> add rmttrail/u01/ggate/dirdat/d2, extract dpump

    GGSCI (gg3) 10> view params dpump

    extract dpump

    userid ggate@gg3, password ggate

    rmthost gg1, mgrport 7809,compress,compressthreshold0

    rmttrail /u01/ggate/dirdat/d2

    passthru

    table dave.pdba;

    2.4 在GG1上配置2个Replicat进程,分别对应Data Pump进程

    --配置checkpoint

    GGSCI (gg1) 3> view params ./GLOBALS

    GGSCHEMA ggate

    CHECKPOINTTABLE ggate.checkpoint

    GGSCI (gg1) 5> dblogin useridggate@gg1,password ggate

    Successfully logged into database.

    GGSCI (gg1) 7> add checkpointtableggate.checkpoint

    Successfully created checkpoint tableGGATE.CHECKPOINT.

    这部分,参考:

    Oracle Golden Gate 系列十三 --配置GG进程检查点(checkpoint) 说明

    http://blog.csdn.net/tianlesoftware/article/details/6983928

    --配置2个Replicat:

    --rep1:

    GGSCI (gg1) 8> add replicatrep1,exttrail /u01/ggate/dirdat/d1, checkpointtable ggate.checkpoint

    REPLICAT added.

    GGSCI (gg1) 11> view params rep1

    replicat rep1

    ASSUMETARGETDEFS

    userid ggate@gg1,password ggate

    discardfile/u01/ggate/dirdat/rep1_discard.txt, append, megabytes 10

    --HANDLECOLLISIONS

    ddl include all

    ddlerror default ignore retryop

    map dave.pdba, target dave.pdba;

    --rep2:

    GGSCI (gg1) 9> add replicatrep2,exttrail /u01/ggate/dirdat/d2, checkpointtable ggate.checkpoint

    REPLICAT added.

    GGSCI (gg1) 13> view params rep2

    replicat rep2

    ASSUMETARGETDEFS

    userid ggate@gg1,password ggate

    discardfile/u01/ggate/dirdat/rep2_discard.txt, append, megabytes 10

    --HANDLECOLLISIONS

    ddl include all

    ddlerror default ignore retryop

    map dave.pdba, target dave.pdba;

    数据初始化的问题,具体参考:

    Oracle Golden Gate 系列九--GG 数据初始化装载 说明 与 示例

    http://blog.csdn.net/tianlesoftware/article/details/6976551

    OracleGoldenGate 系列十二--GG 数据初始化装载二 基于SCN 的初始化 说明 与 示例

    http://blog.csdn.net/tianlesoftware/article/details/6982908

    2.5 启动相关进程进行测试

    --GG2:

    GGSCI (gg2) 20> start ext1

    确定ext1的trails 文件:

    GGSCI (gg2) 25> view report ext1

    2011-11-20 11:31:43  INFO   OGG-01056  Recovery initializationcompleted for target file /u01/ggate

    /dirdat/lt000018, at RBA 132350, CSN1525929.

    2011-11-20 11:31:43  INFO   OGG-01478  Output file/u01/ggate/dirdat/lt is using format RELEASE 10.

    4/11.1.

    2011-11-20 11:31:43  WARNING OGG-01438  Checkpoint marked as from graceful shutdown,but records fou

    nd after checkpoint in trail/u01/ggate/dirdat/lt.  Expected EOF Seqno0, RBA 0.  Found Seqno 18, RB

    A 132350.

    2011-11-20 11:31:43  INFO   OGG-01026  Rolling over remotefile /u01/ggate/dirdat/lt000018.

    GGSCI (gg2) 24> info dpump

    EXTRACT   DPUMP     Initialized   2011-11-20 11:11   Status STOPPED

    Checkpoint Lag       00:00:00 (updated 00:20:16 ago)

    Log Read Checkpoint  File /u01/ggate/dirdat/lt000000

                         First Record  RBA 0

    GGSCI (gg2) 26> alter extractdpump,extseqno 18,extrba 0

    EXTRACT altered.

    GGSCI (gg2) 27> start dpump

    Sending START request to MANAGER ...

    EXTRACT DPUMP starting

    GGSCI (gg2) 40> info all

    Program    Status      Group       Lag           Time Since Chkpt

    MANAGER    RUNNING                                          

    EXTRACT    RUNNING     DPUMP       00:00:00      00:02:05   

    EXTRACT    RUNNING     EXT1        00:00:00      00:00:02   

    REPLICAT   ABENDED     REP1        00:00:00      12:24:42  

    GGSCI (gg2) 41> info dpump

    EXTRACT   DPUMP     Last Started 2011-11-2011:37   Status RUNNING

    Checkpoint Lag       00:00:00 (updated 00:02:24 ago)

    Log Read Checkpoint  File /u01/ggate/dirdat/lt000018

                         First Record  RBA 0

    这里注意Data Pump 默认是从lt000000 开始读取trail 文件,如果是新搭建的GG 同步,那么都是从lt000000开始处理,所以没有问题,如果是后来修改称data pump,就需要根据extract 进程的错误提示信息,使用如下命令:

          GGSCI(gg1) 82> alter extract dpump,extseqno 18,extrba 0

    具体参考:

    OracleGoldenGate 系列十 -- 配置DataPump process 说明 与 示例

    http://blog.csdn.net/tianlesoftware/article/details/6978501

    --GG3:

    GGSCI (gg3) 14> start ext1

    Sending START request to MANAGER ...

    EXTRACT EXT1 starting

    GGSCI (gg3) 18> view report ext1

    2011-11-20 11:39:50  INFO   OGG-01056  Recovery initializationcompleted for target file /u01/ggate

    /dirdat/lt000016, at RBA 132672, CSN1525929.

    2011-11-20 11:39:50  INFO   OGG-01478  Output file/u01/ggate/dirdat/lt is using format RELEASE 10.

    4/11.1.

    2011-11-20 11:39:50  WARNING OGG-01438  Checkpoint marked as from graceful shutdown,but records fou

    nd after checkpoint in trail/u01/ggate/dirdat/lt.  Expected EOF Seqno0, RBA 0.  Found Seqno 16, RB

    A 132672.

    2011-11-20 11:39:50  INFO   OGG-01026  Rolling over remotefile /u01/ggate/dirdat/lt000016.

    GGSCI (gg3) 19> alter extractdpump,extseqno 16,extrba 0

    EXTRACT altered.

    GGSCI (gg3) 20> info dpump

    EXTRACT   DPUMP     Initialized   2011-11-20 11:41   Status STOPPED

    Checkpoint Lag       00:00:00 (updated 00:00:06 ago)

    Log Read Checkpoint  File /u01/ggate/dirdat/lt000016

                         First Record  RBA 0

    GGSCI (gg3) 21> start dpump

    Sending START request to MANAGER ...

    EXTRACT DPUMP starting

    GGSCI (gg3) 22> info all

    Program    Status      Group       Lag           Time Since Chkpt

    MANAGER    RUNNING                                          

    EXTRACT    RUNNING     DPUMP       00:00:00      00:00:13   

    EXTRACT    RUNNING     EXT1        00:00:00      00:00:05   

    REPLICAT   ABENDED     REP1        00:00:00      12:28:42   

    --GG1:

    GGSCI (gg1) 18> start rep1

    Sending START request to MANAGER ...

    REPLICAT REP1 starting

    GGSCI (gg1) 21> start rep2

    Sending START request to MANAGER ...

    REPLICAT REP2 starting

    GGSCI (gg1) 22> info all

    Program    Status      Group       Lag           Time Since Chkpt

    MANAGER    RUNNING                                          

    EXTRACT    ABENDED     DPUMP       00:00:00      12:34:55   

    EXTRACT    ABENDED     DPUMP2      00:00:00      12:34:49   

    EXTRACT    ABENDED     EXT1        00:00:00      12:34:51   

    REPLICAT   RUNNING     REP1        00:00:00      00:17:37   

    REPLICAT   RUNNING     REP2        00:00:00      00:17:15   

    2.6 验证同步

    --GG2的pdba 表insert 一条记录:

    SQL> conn dave/dave;

    Connected.

    SQL> insert into pdba values(2222,sysdate);

    1 row created.

    SQL> commit;

    Commit complete.

    --GG3的pdba 表insert 一条记录:

    SQL> conn dave/dave;

    Connected.

    SQL> insert into pdba values(3333,sysdate);

    1 row created.

    SQL> commit;

    Commit complete.

    --在GG1 的pdba 表查询这条记录:

    SQL> select * from pdba where idin(2222,3333);

    no rows selected

    没有同步过来。

    GGSCI (gg1) 86> info rep1

    REPLICAT  REP1      Last Started 2011-11-2012:22   Status RUNNING

    Checkpoint Lag       00:00:00 (updated 00:00:01 ago)

    Log Read Checkpoint  File /u01/ggate/dirdat/d1000000

                         First Record  RBA 0

    GGSCI (gg1) 87> stop rep1

    Sending STOP request to REPLICAT REP1 ...

    Request processed.

    GGSCI (gg1) 88> stop rep2

    Sending STOP request to REPLICAT REP2 ...

    Request processed.

    GGSCI (gg1) 89> alter replicat rep1extseqno 1 extrba 0

    REPLICAT altered.

    GGSCI (gg1) 91> alter replicat rep2 extseqno1 extrba 0

    REPLICAT altered.

    GGSCI (gg1) 93> start rep1

    Sending START request to MANAGER ...

    REPLICAT REP1 starting

    GGSCI (gg1) 94> start rep2

    Sending START request to MANAGER ...

    REPLICAT REP2 starting

    --最后确认一下:

    SQL> select * from pdba where id<4444;

           ID TIME

    ---------- ------------

         2222 20-NOV-11

         3333 20-NOV-11

    多对一的测试同步成功。

    注意这里有2个小技巧:

    1.     如果同步有问题,但进程report又没有提供更多信息,可以使用Logdump工具查看trails里的内容,从而确认是哪一步出现了问题。关于logdump ,参考:

    Oracle Golden Gate 系列十五 --GG Trails 说明

    http://blog.csdn.net/tianlesoftware/article/details/6990611

    2.     如果tail 搞不定,一个简单的解决方法:

    把所有进程delete 掉,所有的tails 全部delete 掉,重新来过,这种方法比查看trail 要节省很多时间。

    -------------------------------------------------------------------------------------------------------

    版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!

    Blog:     http://blog.csdn.net/tianlesoftware

    Weibo: http://weibo.com/tianlesoftware

    Email:   tianlesoftware@gmail.com

    Skype: tianlesoftware

    -------加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请----

    DBA1 群:62697716(满);   DBA2 群:62697977(满)  DBA3 群:62697850(满)  

    DBA 超级群:63306533(满);  DBA4 群:83829929(满) DBA5群: 142216823(满) 

    DBA6 群:158654907(满)   DBA7 群:69087192(满)  DBA8 群:172855474

    DBA 超级群2:151508914  DBA9群:102954821     聊天 群:40132017(满)

  • 相关阅读:
    文件中的类都不能进行设计,因此未能为该文件显示设计器。设计器检查出文件中有以下类: FormMain --- 未能加载基类“WinForm.Win.FormsBase.FormMainBase”。请确保已引用该程序集并已生成所有项目
    Xcopy命令参数使用介绍
    c#生成方案里预生成拷贝文件
    visual studio 2010 出现问题,不能设置断点调试了,一运行就未响应,然后程序退出
    开启Visual Studio 2013时,出现Microsoft.VisualStudio.Web.PasteJson.JsonPackage无法载入的可能解決方案
    Java开发知识之XML文档使用,解析
    JAVA开发知识之Java的线程
    git版本控制工具的使用
    Java开发知识之Java的正则表达式
    PC逆向之代码还原技术,第一讲基本数据类型在内存中的表现形式.浮点,指针寻址公式
  • 原文地址:https://www.cnblogs.com/tianlesoftware/p/3609478.html
Copyright © 2011-2022 走看看