Oracle Golden Gate 系列十八 GG 多对一 realtime data warehousing 说明与示例

zoukankan html css js c++ java

Oracle Golden Gate 系列十八 GG 多对一 realtime data warehousing 说明与示例

一.官网说明
A datawarehousing configuration is a many-to-one configuration. Multiple sourcedatabases send data to one target warehouse database. Oracle GoldenGatesupports like-to-like or heterogeneous transfer of data, with capabilities forfiltering and conversion on any system in the configuration (support varies bydatabase platform).
       --多个Source DB发送到一个Target warehouse DB。


1.1 Considerations for a data warehousing configuration
1.1.1 Isolation of data records
Thisconfiguration assumes that each source database contributes different recordsto the target system. If the same record exists in the same table on two ormore source systems and can be changed on any of those systems, conflictresolution routines are needed to resolve conflicts when changes to that recordare made on both sources at the same time and replicated to the target table.
这里假设每个source DB 发送给TargetDB的都是不同的record，如果是相同的record，可能就会在TargetDB 上产生冲突。

1.1.2 Data storage
You can dividethe data storage between the source systems and the target system to reduce theneed for massive amounts of disk space on the target system. This is accomplishedby using a data pump on each source, rather than sending data directly from eachExtract across the network to the target.
(1)    A primary Extract writes to alocal trail on each source.
(2)    A data-pump Extract on eachsource reads the local trail and sends it across TCP/IP to a dedicated Replicatgroup.
1.1.3 Filtering and conversion
If not all ofthe data from a source system will be sent to the data warehouse, you can use thedata pump to perform the filtering. This removes that processing overhead fromthe primary Extract group, and it reduces the amount of data that is sentacross the network.
1.1.3.1 To filter data,you can use:
(1)    A FILTER or WHERE clause in a TABLEstatement (Extract) or in a MAP statement (Replicat).
(2)    A SQL query or procedure
(3)    User exits
1.1.3.2 To transformdata, you can use:
(1)    Native Oracle GoldenGateconversion functions
(2)    A user exit from the Extract orReplicat process that applies rules from an external
(3)    transformation solution, thenreturns the manipulated data to Oracle GoldenGate.
(4)    Replicat to deliver datadirectly to an ETL solution or other transformation engine.

1.1.4 Data volume
The standard configuration is sufficientif:
(1)    The transaction load isconsistent and of moderate volume that is spread out more or less evenly amongall of the objects to be replicated.
(2)    There are none of thefollowing: tables that are subject to long-running transactions, tables thathave a very large number of columns that change, or tables that contain columnsfor which Oracle GoldenGate must fetch from the database (generally columnswith LOBs, columns that are affected by SQL procedures executed by Oracle GoldenGate,and columns that are not logged to the transaction log).
If yourenvironment does not satisfy those conditions, consider adding one or more setsof parallel processes.
1.2 Creating a data warehousing configuration
Figure 13 Configuration for datawarehousing

1.2.1 Source systems
--To configure theManager process
1. On each source, configure the Managerprocess.
2. In each Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail on the local system.

--To configure theprimary Extract groups
3. On each source, use the ADD EXTRACT commandto create a primary Extract group. For documentation purposes, these groups arecalled ext_1 and ext_2.
Extract_1
ADD EXTRACT<ext_1>, TRANLOG, BEGIN <time> [, THREADS <n>]
Extract_2
ADD EXTRACT<ext_2>, TRANLOG, BEGIN <time> [, THREADS <n>]
注意： Use TRANLOG as the data source option. For DB2 on Z/OS, specify thebootstrap data set (BSDS) name following TRANLOG.
4. On each source, use the ADD EXTTRAIL commandto create a local trail.
Extract_1
ADD EXTTRAIL<local_trail_1>, EXTRACT <ext_1>
Extract_2
ADD EXTTRAIL<local_trail_2>, EXTRACT <ext_2>
注意： Use the EXTRACT argument to link each Extract group to the localtrail on the same system. The primary Extract writes to this trail, and thedata-pump reads it.
5. On each source, use the EDIT PARAMS commandto create a parameter file for the primary Extract. Include the followingparameters plus any others that apply to your database environment.
Extract_1
-- Identify the Extract group:
EXTRACT <ext_1>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the local trail that thisExtract writes to:
EXTTRAIL <local_trail_1>
-- Specify tables to be captured:
TABLE <owner>.<table>;
Extract_2
-- Identify the Extract group:
EXTRACT <ext_2>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the local trail that thisExtract writes to:
EXTTRAIL <local_trail_2>
-- Specify tables to be captured:
TABLE <owner>.<table>;

--To configure the datapumps
6. On each source, use the ADD EXTRACT commandto create a data pump Extract group. For documentation purposes, these pumpsare called pump_1 and pump_2.
Data pump_1
ADD EXTRACT <pump_1>, EXTTRAILSOURCE<local_trail_1>, BEGIN <time>
Data pump_2
ADD EXTRACT <pump_2>, EXTTRAILSOURCE<local_trail_2>, BEGIN <time>
注意：Use EXTTRAILSOURCE as the data source option, and specify the nameof the trail on the local system.
7. On each source, use the ADD RMTTRAIL commandto create a remote trail on the target.
Source_1
ADD RMTTRAIL<remote_trail_1>, EXTRACT <pump_1>
Source_2
ADD RMTTRAIL<remote_trail_2>, EXTRACT <pump_2>
注意：Use the EXTRACT argumentto link each remote trail to a different data pump. The data pump writes tothis trail over TCP/IP, and a Replicat reads from it.
8. On each source, use the EDIT PARAMS commandto create a parameter file for the data pump group. Include the followingparameters plus any others that apply to your
database environment.
Data pump_1
-- Identify the data pump group:
EXTRACT <pump_1>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_1>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the name or IP address of thetarget system:
RMTHOST <target>, MGRPORT<portnumber>
-- Specify the remote trail on the target system:
RMTTRAIL <remote_trail_1>
-- Allow mapping, filtering, conversion orpass data through as-is:
[PASSTHRU | NOPASSTHRU]
-- Specify tables to be captured:
TABLE <owner>.<table>;
Data pump_2
-- Identify the data pump group:
EXTRACT <pump_2>
-- Specify database login information asneeded for the database:
[SOURCEDB <dsn_2>,][USERID<user>[, PASSWORD <pw>]]
-- Specify the name or IP address of thetarget system:
RMTHOST <target>, MGRPORT<portnumber>
-- Specify the remote trail on the targetsystem:
RMTTRAIL <remote_trail_2>
-- Allow mapping, filtering, conversion orpass data through as-is:
[PASSTHRU | NOPASSTHRU]
-- Specify tables to be captured:
TABLE <owner>.<table>;
注意：Use NOPASSTHRU if the data pump will be filtering or convertingdata, and also use the SOURCEDB and USERID parameters as appropriate for thedatabase, to enable definitions lookups. If the data pump will not be filteringor converting data, use PASSTHRU to bypass the lookups.
NOTE：
To use PASSTHRU mode,the names of the source and target objects must be identical. No column mapping, filtering, SQLEXECfunctions, transformation, or other functions that require data manipulationcan be specified in the parameter file. You can combine normal processing withpass-through processing by pairing PASSTHRU and NOPASSTHRU with different TABLEstatements.
1.2.2 Target system
--To configure theManager process
9. Configure the Manager process.
10. In the Manager parameter file, use the PURGEOLDEXTRACTSparameter to control the purging of files from the trail.

--To configure the Replicat groups
11. On the target, use the ADD REPLICAT commandto create a Replicat group for each remote trail that you created. Fordocumentation purposes, these groups are called rep_1 and rep_2.
Replicat_1
ADD REPLICAT <rep_1>, EXTTRAIL<remote_trail_1>, BEGIN <time>
Replicat_2
ADD REPLICAT <rep_2>, EXTTRAIL<remote_trail_2>, BEGIN <time>
注意： Use the EXTTRAIL argumentto link the Replicat group to the trail.
              每个Data Pump 对应一个Replicat进程。
12. On the target, use the EDIT PARAMS commandto create a parameter file for each Replicat group. Include the followingparameters plus any others that apply to your database environment.
Replicat_1
-- Identify the Replicat group:
REPLICAT <rep_1>
-- State whether or not source and targetdefinitions are identical:
SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS
-- Specify database login information asneeded for the database:
[TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]
-- Specify error handling rules:
REPERROR (<error>, <response>)
-- Specify tables for delivery:
MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];
Replicat_2
-- Identify the Replicat group:
REPLICAT <rep_2>
-- State whether or not source and targetdefinitions are identical:
SOURCEDEFS <full_pathname> |ASSUMETARGETDEFS
-- Specify database login information asneeded for the database:
[TARGETDB <dsn_3>,] [USERID <userid>[, PASSWORD <pw>]]
-- Specify error handling rules:
REPERROR (<error>, <response>)
-- Specify tables for delivery:
MAP <owner>.<table>, TARGET<owner>.<table>[, DEF <template name>];
注意：You can use anynumber of MAP statements for any given Replicat group. All MAP statements for agiven Replicat group must specify the same objects that are contained in thetrail that is linked to the group.
二.示例
这里还是3个节点的GG，我们这里从节点2和节点3发送到节点1.
DB： oracle 11.2.0.3
OS:  redhat 5.4
2.1 在所有节点上设置Manager
GGSCI (gg1) 2> edit params mgr
PORT 7809
2.2 分别在GG2和GG3上添加Extract进程
--GG2
GGSCI (gg2) 4> add extract ext1,tranlog,begin now
GGSCI (gg2) 5> add exttrail/u01/ggate/dirdat/lt, extract ext1
GGSCI (gg2) 7> view params ext1
extract ext1
userid ggate@gg2,password ggate
exttrail /u01/ggate/dirdat/lt
ddl include all objname dave.pdba;
table dave.pdba;
--GG3
GGSCI (gg3) 2> add extract ext1,tranlog,begin now
GGSCI (gg3) 4> add exttrail/u01/ggate/dirdat/lt, extract ext1
GGSCI (gg3) 6> view params ext1
extract ext1
userid ggate@gg3,password ggate
exttrail /u01/ggate/dirdat/lt
ddl include all objname dave.pdba;
table dave.pdba;
2.3 分别在GG2和GG3上添加DataPump 进程
--GG2
GGSCI (gg2) 8> add extractdpump,exttrailsource /u01/ggate/dirdat/lt
GGSCI (gg2) 9> add rmttrail/u01/ggate/dirdat/d1, extract dpump
--注意这里指定的Target端的位置，我们以d1 开头。
GGSCI (gg2) 11> view params dpump
extract dpump
userid ggate@gg2, password ggate
rmthost gg1, mgrport 7809,compress,compressthreshold0
rmttrail /u01/ggate/dirdat/d1
passthru
table dave.pdba;
--GG3
GGSCI (gg3) 7> add extractdpump,exttrailsource /u01/ggate/dirdat/lt
GGSCI (gg3) 8> add rmttrail/u01/ggate/dirdat/d2, extract dpump
GGSCI (gg3) 10> view params dpump
extract dpump
userid ggate@gg3, password ggate
rmthost gg1, mgrport 7809,compress,compressthreshold0
rmttrail /u01/ggate/dirdat/d2
passthru
table dave.pdba;
2.4 在GG1上配置2个Replicat进程，分别对应Data Pump进程
--配置checkpoint
GGSCI (gg1) 3> view params ./GLOBALS
GGSCHEMA ggate
CHECKPOINTTABLE ggate.checkpoint
GGSCI (gg1) 5> dblogin useridggate@gg1,password ggate
Successfully logged into database.
GGSCI (gg1) 7> add checkpointtableggate.checkpoint
Successfully created checkpoint tableGGATE.CHECKPOINT.
这部分，参考：
Oracle Golden Gate 系列十三 --配置GG进程检查点（checkpoint）说明
http://blog.csdn.net/tianlesoftware/article/details/6983928
--配置2个Replicat：
--rep1:
GGSCI (gg1) 8> add replicatrep1,exttrail /u01/ggate/dirdat/d1, checkpointtable ggate.checkpoint
REPLICAT added.
GGSCI (gg1) 11> view params rep1
replicat rep1
ASSUMETARGETDEFS
userid ggate@gg1,password ggate
discardfile/u01/ggate/dirdat/rep1_discard.txt, append, megabytes 10
--HANDLECOLLISIONS
ddl include all
ddlerror default ignore retryop
map dave.pdba, target dave.pdba;
--rep2:
GGSCI (gg1) 9> add replicatrep2,exttrail /u01/ggate/dirdat/d2, checkpointtable ggate.checkpoint
REPLICAT added.
GGSCI (gg1) 13> view params rep2
replicat rep2
ASSUMETARGETDEFS
userid ggate@gg1,password ggate
discardfile/u01/ggate/dirdat/rep2_discard.txt, append, megabytes 10
--HANDLECOLLISIONS
ddl include all
ddlerror default ignore retryop
map dave.pdba, target dave.pdba;
数据初始化的问题，具体参考：
Oracle Golden Gate 系列九--GG 数据初始化装载说明与示例
http://blog.csdn.net/tianlesoftware/article/details/6976551
OracleGoldenGate 系列十二--GG 数据初始化装载二基于SCN 的初始化说明与示例
http://blog.csdn.net/tianlesoftware/article/details/6982908
2.5 启动相关进程进行测试
--GG2：
GGSCI (gg2) 20> start ext1
确定ext1的trails 文件：
GGSCI (gg2) 25> view report ext1
…
2011-11-20 11:31:43 INFO   OGG-01056 Recovery initializationcompleted for target file /u01/ggate
/dirdat/lt000018, at RBA 132350, CSN1525929.
2011-11-20 11:31:43 INFO   OGG-01478 Output file/u01/ggate/dirdat/lt is using format RELEASE 10.
4/11.1.
2011-11-20 11:31:43 WARNING OGG-01438 Checkpoint marked as from graceful shutdown,but records fou
nd after checkpoint in trail/u01/ggate/dirdat/lt. Expected EOF Seqno0, RBA 0. Found Seqno 18, RB
A 132350.
2011-11-20 11:31:43 INFO   OGG-01026 Rolling over remotefile /u01/ggate/dirdat/lt000018.
…
GGSCI (gg2) 24> info dpump
EXTRACT   DPUMP     Initialized   2011-11-20 11:11   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:20:16 ago)
Log Read Checkpoint File /u01/ggate/dirdat/lt000000
                     First Record RBA 0
GGSCI (gg2) 26> alter extractdpump,extseqno 18,extrba 0
EXTRACT altered.
GGSCI (gg2) 27> start dpump
Sending START request to MANAGER ...
EXTRACT DPUMP starting
GGSCI (gg2) 40> info all
Program    Status      Group       Lag           Time Since Chkpt
MANAGER    RUNNING
EXTRACT    RUNNING     DPUMP       00:00:00      00:02:05
EXTRACT    RUNNING     EXT1        00:00:00      00:00:02
REPLICAT   ABENDED     REP1        00:00:00      12:24:42
GGSCI (gg2) 41> info dpump
EXTRACT   DPUMP     Last Started 2011-11-2011:37   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:02:24 ago)
Log Read Checkpoint File /u01/ggate/dirdat/lt000018
                     First Record RBA 0
这里注意Data Pump 默认是从lt000000 开始读取trail 文件，如果是新搭建的GG 同步，那么都是从lt000000开始处理，所以没有问题，如果是后来修改称data pump，就需要根据extract 进程的错误提示信息，使用如下命令：
      GGSCI(gg1) 82> alter extract dpump,extseqno 18,extrba 0
具体参考：
OracleGoldenGate 系列十 -- 配置DataPump process 说明与示例
http://blog.csdn.net/tianlesoftware/article/details/6978501
--GG3:
GGSCI (gg3) 14> start ext1
Sending START request to MANAGER ...
EXTRACT EXT1 starting
GGSCI (gg3) 18> view report ext1
…
2011-11-20 11:39:50 INFO   OGG-01056 Recovery initializationcompleted for target file /u01/ggate
/dirdat/lt000016, at RBA 132672, CSN1525929.
2011-11-20 11:39:50 INFO   OGG-01478 Output file/u01/ggate/dirdat/lt is using format RELEASE 10.
4/11.1.
2011-11-20 11:39:50 WARNING OGG-01438 Checkpoint marked as from graceful shutdown,but records fou
nd after checkpoint in trail/u01/ggate/dirdat/lt. Expected EOF Seqno0, RBA 0. Found Seqno 16, RB
A 132672.
2011-11-20 11:39:50 INFO   OGG-01026 Rolling over remotefile /u01/ggate/dirdat/lt000016.
…
GGSCI (gg3) 19> alter extractdpump,extseqno 16,extrba 0
EXTRACT altered.
GGSCI (gg3) 20> info dpump
EXTRACT   DPUMP     Initialized   2011-11-20 11:41   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:00:06 ago)
Log Read Checkpoint File /u01/ggate/dirdat/lt000016
                     First Record RBA 0
GGSCI (gg3) 21> start dpump
Sending START request to MANAGER ...
EXTRACT DPUMP starting
GGSCI (gg3) 22> info all
Program    Status      Group       Lag           Time Since Chkpt
MANAGER    RUNNING
EXTRACT    RUNNING     DPUMP       00:00:00      00:00:13
EXTRACT    RUNNING     EXT1        00:00:00      00:00:05
REPLICAT   ABENDED     REP1        00:00:00      12:28:42
--GG1:
GGSCI (gg1) 18> start rep1
Sending START request to MANAGER ...
REPLICAT REP1 starting
GGSCI (gg1) 21> start rep2
Sending START request to MANAGER ...
REPLICAT REP2 starting
GGSCI (gg1) 22> info all
Program    Status      Group       Lag           Time Since Chkpt
MANAGER    RUNNING
EXTRACT    ABENDED     DPUMP       00:00:00      12:34:55
EXTRACT    ABENDED     DPUMP2      00:00:00      12:34:49
EXTRACT    ABENDED     EXT1        00:00:00      12:34:51
REPLICAT   RUNNING     REP1        00:00:00      00:17:37
REPLICAT   RUNNING     REP2        00:00:00      00:17:15
2.6 验证同步
--GG2的pdba 表insert 一条记录：
SQL> conn dave/dave;
Connected.
SQL> insert into pdba values(2222,sysdate);
1 row created.
SQL> commit;
Commit complete.
--GG3的pdba 表insert 一条记录：
SQL> conn dave/dave;
Connected.
SQL> insert into pdba values(3333,sysdate);
1 row created.
SQL> commit;
Commit complete.
--在GG1 的pdba 表查询这条记录：
SQL> select * from pdba where idin(2222,3333);
no rows selected
没有同步过来。
GGSCI (gg1) 86> info rep1
REPLICAT  REP1      Last Started 2011-11-2012:22   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:01 ago)
Log Read Checkpoint File /u01/ggate/dirdat/d1000000
                     First Record RBA 0
GGSCI (gg1) 87> stop rep1
Sending STOP request to REPLICAT REP1 ...
Request processed.
GGSCI (gg1) 88> stop rep2
Sending STOP request to REPLICAT REP2 ...
Request processed.
GGSCI (gg1) 89> alter replicat rep1extseqno 1 extrba 0
REPLICAT altered.
GGSCI (gg1) 91> alter replicat rep2 extseqno1 extrba 0
REPLICAT altered.
GGSCI (gg1) 93> start rep1
Sending START request to MANAGER ...
REPLICAT REP1 starting
GGSCI (gg1) 94> start rep2
Sending START request to MANAGER ...
REPLICAT REP2 starting
--最后确认一下：
SQL> select * from pdba where id<4444;
       ID TIME
---------- ------------
     2222 20-NOV-11
     3333 20-NOV-11
多对一的测试同步成功。
注意这里有2个小技巧：
1.     如果同步有问题，但进程report又没有提供更多信息，可以使用Logdump工具查看trails里的内容，从而确认是哪一步出现了问题。关于logdump ，参考：
Oracle Golden Gate 系列十五 --GG Trails 说明
http://blog.csdn.net/tianlesoftware/article/details/6990611
2.     如果tail 搞不定，一个简单的解决方法：
把所有进程delete 掉，所有的tails 全部delete 掉，重新来过，这种方法比查看trail 要节省很多时间。
-------------------------------------------------------------------------------------------------------
版权所有，文章允许转载，但必须以链接方式注明源地址，否则追究法律责任!
Blog:     http://blog.csdn.net/tianlesoftware
Weibo: http://weibo.com/tianlesoftware
Email: tianlesoftware@gmail.com
Skype: tianlesoftware
-------加群需要在备注说明Oracle表空间和数据文件的关系，否则拒绝申请----
DBA1 群：62697716(满);   DBA2 群：62697977(满)  DBA3 群：62697850(满)
DBA 超级群：63306533(满); DBA4 群：83829929(满) DBA5群： 142216823(满)
DBA6 群：158654907(满)   DBA7 群：69087192(满)  DBA8 群：172855474
DBA 超级群2：151508914 DBA9群：102954821     聊天群：40132017(满)

道森Oracle，国内最早、最大的网络语音培训机构，我们提供专业、优质的Oracle技术培训和服务！我们的官方网站：http://www.daosenoracle.com 官方淘宝店：http://daosenpx.taobao.com/

查看全文

相关阅读:
python之redis
redis 双写一致性问题
 MySQL主从复制与读写分离
 高并发下Redis如何保持数据一致性（避免读后写）
redis的no-appendfsync-on-rewrite参数
 Redis基础、高级特性与性能调优
 python连接redis sentinel集群
 ListView阻尼效果
 零基础学python-5.6 数字位操作与其它工具
 iOS 实现QQ界面

原文地址：https://www.cnblogs.com/tianlesoftware/p/3609478.html

Oracle Golden Gate 系列十八 GG 多对一 realtime data warehousing 说明 与 示例

一.官网说明

1.1 Considerations for a data warehousing configuration

1.2 Creating a data warehousing configuration

1.2.1 Source systems

1.2.2 Target system

二.示例

2.1 在所有节点上设置Manager

2.2 分别在GG2和GG3上添加Extract进程

2.3 分别在GG2和GG3上添加DataPump 进程

2.4 在GG1上配置2个Replicat进程，分别对应Data Pump进程

2.5 启动相关进程进行测试

2.6 验证同步

Oracle Golden Gate 系列十八 GG 多对一 realtime data warehousing 说明与示例