RACà单机复制配置
1.1 环境简介
性质 |
IP |
系统 |
ORACLE版本 |
源端 |
10.123.112.201/10.123.112.202 |
LINUX rhel5 64位 |
10.2.0.1 |
目标端 |
10.123.112.235 |
LINUX rhel5 32位 |
10.2.0.1 |
1.2 源端安装OCFS2集群文件系统
RAC环境中为了实现高可用性,需将OGG安装在集群文件系统中,这样OGG可以访问RAC中的所有节点,我们这里测试采用OCFS2文件系统。
从http://oss.oracle.com下载与LINUX内核相符的OCFS2 RPM包
LINUX下执行uname –r查看系统内核版本 eg:
[oracle@node2 ocfs]$ uname -r
2.6.18-92.el5
使用ROOT用户安装OCFS2的RPM包
[root@node1 ocfs]# rpm -ivh ocfs2-tools-1.2.7-1.el5.x86_64.rpm
ocfs2console-1.2.7-1.el5.x86_64.rpm
ocfs2-2.6.18-92.el5-1.2.9-1.el5.x86_64.rpm
进入OCFS2控制台界面
[root@node1 ~]# ocfs2console
在出现的窗体中选择[Clucster]-[Configure Nodes]在"Node Configuration"对话框中,输入2个专用互连的节点名、IP 地址、端口号后,选择 [Clucster]-[Propagate Cluster Configuration] ,提示"Finished"。
配置后的信息显示如下:
在集群中的所有节点上以 root 用户帐户的身份运行以下命令
export PATH=$PATH:/sbin:/usr/sbin
/etc/init.d/o2cb enable
创建ocfs2文件系统,其中-N选项用于指明最多允许多少个节点同时使用此文件系统:
# mkfs -t ocfs2 -N 2 /dev/sdh1
挂载分区:
# mount /dev/sdh1 /ggate
配置启动自动载入(所有节点):
export PATH=$PATH:/sbin:/usr/sbin
chkconfig --add o2cb
/etc/init.d/o2cb configure
在/etc/rc.local增加入下内容:
chown -R oracle:dba /ggate
chmod -R 775 /ggate
1.3 源端安装GoldenGate
在GoldenGate安装目录(OCFS2目录/ggate)解压安装文件
unzip ogg112101_fbo_ggs_Linux_x64_ora10g_64bit.zip
tar –xvf fbo_ggs_Linux_x64_ora10g_64bit.tar
设置环境变量
在用户参数文件中添加以下内容:
export GGATE_HOME=/ggate
export LD_LIBRARY_PATH=$GGATE_HOME:$ORACLE_HOME/lib
注意:添加后需使参数文件生效
安装GoldenGate
进入OGG控制台创建OGG工作目录
然后在安装目录下执行 ./ggsci 进入OGG控制台
执行命令 create subdirs创建工作目录,显示如下:
GGSCI (node1) 1> create subdirs
Creating subdirectories under current directory /ggate
Parameter files /ggate/dirprm: already exists
Report files /ggate/dirrpt: created
Checkpoint files /ggate/dirchk: created
Process status files /ggate/dirpcs: created
SQL script files /ggate/dirsql: created
Database definitions files /ggate/dirdef: created
Extract data files /ggate/dirdat: created
Temporary files /ggate/dirtmp: created
Stdout files /ggate/dirout: created
1.4 目标端安装GoldenGate
环境相同,安装方法与4.3一致,仅仅是安装位置不同,安装过程略,注意安装包与平台一致。
1.5 配置源端数据库
数据库模式配置
源端数据库必须开启归档模式
Alter database archivelog;
开启最小附加日志
Alter database add supplemental log data;
使用SELECT SUPPLEMENTAL_LOG_DATA_MIN FROM V$DATABASE;
可查看是否开启了最小附加日志;
源端数据库创建GoldenGate数据库用户并授权:(我们这里以ogg为例,使用其他亦可)
create user ogg identified by oracle default tablespace DATA_OL;
grant connect,resource,unlimited tablespace to ogg;
grant execute on utl_file to ogg;
grant select any dictionary,select any table to ogg;
grant alter any table to ogg;
grant flashback any table to ogg;
grant execute on DBMS_FLASHBACK to ogg;
添加表级transdata
GGSCI (node1) 1> dblogin userid ogg,password oracle
Successfully logged into database.
GGSCI (node1) 2> add trandata SCOTT.DEPT
Logging of supplemental redo data enabled for table SCOTT.DEPT.
GGSCI (node1) 3> add trandata SCOTT.EMP
Logging of supplemental redo data enabled for table SCOTT.EMP.
1.6 配置源端进程组
配置管理进程mgr:
GGSCI (node1) 1> edit param mgr
(粘贴下面这段配置)
PORT 7839
DYNAMICPORTLIST 7840-7939
--AUTOSTART ER *
AUTORESTART EXTRACT *,RETRIES 5,WAITMINUTES 3
PURGEOLDEXTRACTS ./dirdat/*,usecheckpoints, minkeepdays 3
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45
参数说明均与单点配置相同,参考3.5部分
启动管理进程:
GGSCI (node1) 2> start mgr
Manager started.
GGSCI (node1) 3> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
配置抽取进程:
GGSCI (node1) 6> add extract extnd,tranlog,begin now,threads 2
EXTRACT added.
GGSCI (node1) 7> add exttrail ./dirdat/nd,extract extnd,megabytes 100
EXTTRAIL added.
GGSCI (node1) 8> edit params extnd
(粘贴下面这段配置)
EXTRACT extnd
SETENV (NLS_LANG = "AMERICAN_AMERICA.UTF8")
SETENV (ORACLE_HOME = "/u01/app/oracle/product/10.2.0/db_1")
USERID ogg@RAC, PASSWORD oracle
--GETTRUNCATES
REPORTCOUNT EVERY 1 MINUTES, RATE
DISCARDFILE ./dirrpt/extnd.dsc,APPEND,MEGABYTES 1024
--THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 60000 IOLATENS 60000
DBOPTIONS ALLOWUNUSEDCOLUMN
WARNLONGTRANS 2h,CHECKINTERVAL 3m
EXTTRAIL ./dirdat/nd
--TRANLOGOPTIONS EXCLUDEUSER USERNAME
FETCHOPTIONS NOUSESNAPSHOT
TRANLOGOPTIONS CONVERTUCS2CLOBS
TABLE scott.dept;
TABLE scott.emp;
注意:threads与RAC节点数相同即可,RAC中不再使用ORACLE_SID设置,而使用USERID ogg@RAC,注意两个节点均可连接数据库。
添加传输进程,配置参数
GGSCI (node1) 2> add extract dpend,exttrailsource ./dirdat/nd
EXTRACT added.
GGSCI (node1) 3> add rmttrail /uo1/app/ogg/dirdat/nd, EXTRACT DPEND
RMTTRAIL added.
GGSCI (node1) 4> edit params dpend
(粘贴下面这段配置)
EXTRACT dpend
SETENV (NLS_LANG = AMERICAN_AMERICA.UTF8)
USERID ogg@RAC, PASSWORD oracle
PASSTHRU
RMTHOST 10.123.112.235, MGRPORT 7839, compress
RMTTRAIL /uo1/app/ogg/dirdat/nd
TABLE scott.dept;
TABLE scott.emp;
1.7 配置目标数据库
目标库创建GoldenGate数据库用户并授权:
create user ogg identified by oracle default tablespace USERS;
grant connect,resource,unlimited tablespace to ogg;
grant execute on utl_file to ogg;
grant select any dictionary,select any table to ogg;
grant alter any table to ogg;
grant flashback any table to ogg;
grant execute on DBMS_FLASHBACK to ogg;
grant insert any table to ogg;
grant delete any table to ogg;
grant update any table to ogg;
添加checkpoint表
GGSCI (sun.linux) 2> edit params GLOBALS
然后在参数文件中输入
GGSCHEMA ogg
CHECKPOINTTABLE ogg.checkpoint
GGSCI (sun.linux) 4> dblogin userid ogg,password oracle
Successfully logged into database.
GGSCI (sun.linux) 5> add checkpointtable ogg.checkpoint
Successfully created checkpoint table ogg.checkpoint.
1.8 配置目标端进程组
配置MGR参数
GGSCI (sun.linux) 6> edit params mgr
(粘贴下面这段配置)
PORT 7839
DYNAMICPORTLIST 7840-7939
--AUTOSTART ER *
AUTORESTART EXTRACT *,RETRIES 5,WAITMINUTES 3
PURGEOLDEXTRACTS ./dirdat/*,usecheckpoints, minkeepdays 3
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45
配置复制队列
GGSCI (sun.linux)8> add replicat repnd,exttrail /uo1/app/ogg/dirdat/nd,checkpointtable ogg.checkpoint
REPLICAT added.
GGSCI (sun.linux) 10> edit params repnd
(粘贴下面这段配置)
REPLICAT repnd
SETENV (NLS_LANG = AMERICAN_AMERICA.UTF8)
USERID ogg, PASSWORD oracle
ASSUMETARGETDEFS
REPERROR default,discard
discardfile ./dirrpt/repnd.dsc,append,megabytes 50
map scott.*,target pmsbi.*;
1.9 启动进程进行数据同步
启动源端进程组
启动抽取进程和传输进程:
start extnd
start dpend
启动后使用info all查看进程状态,正常status应该RUNNING,显示如下:
GGSCI (node1) 19> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING DPEND 00:00:00 00:00:09
EXTRACT RUNNING EXTND 00:00:00 00:00:04
启动目标端进程
start repnd
显示如下:
GGSCI (sun.linux) 2> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING REPND 00:00:00 00:00:03
到此RAC到单点OGG的安装配置就完成了,可以进行数据同步测试了。
2 RAC单机下的HA配置
第4部分的RACà单机的配置仅仅完成了数据复制的功能,不包含高可用的配置,当运行GoldenGate的节点出现故障时复制功能就将终止,如何使复制功能继续可用呢,有如下两种方式:
2.1 节点故障的手工处理方式
因为GoldenGate 安装在共享目录下,我们可以通过任一个节点连接到共享目录,启动GoldenGate运行界面。如果其中一个节点失败,导致GoldenGate进程中止,可以直接手工在另外一个节点启动进程组即可。
2.2 GoldenGate的HA配置
我们可以通过使用CRS来管理GoldenGate资源组,并且使用RAC的vip连接到GoldenGate,一旦数据库的某一个节点宕掉,Oracle clusterware将自动切换到另一个可用节点。
添加一个应用程序VIP资源
为GoldenGate vip资源创建一个profile
[oracle@node1 ggate]$ cd $ORA_CRS_HOME/bin
[oracle@node1 bin]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/bin
[oracle@node1 bin]$ crs_profile –create ggvip –t application
–a /u01/app/oracle/product/10.2.0/crs_1/bin/usrvip
-o oi=eth0,ov=192.168.73.203,on=255.255.255.0
其中:ggvip为创建的应用程序vip的名字
把这个资源注册到CRS:
[oracle@node1 bin]$ crs_register ggvip
把vip 的所有权给root,在root用户下执行:
[root@node1 bin]# ./crs_setperm ggvip –o root
为oracle用户分配启动这个资源的权限:
[root@node1 bin]# ./crs_setperm ggvip –u user:oracle:r-x
通过oracle用户启动这个资源:
[oracle@node1 bin]$ crs_start ggvip
Attempting to start `ggvip` on member `node1`
Start of `ggvip` on member `node1` succeeded.
查看资源状态显示如下:
[oracle@node1 bin]$ crs_stat ggvip -t
Name Type Target State Host
------------------------------------------------------------
ggvip application ONLINE ONLINE node1
创建一个action程序
action程序我们这里放到共享磁盘上,action程序最少需要可以接受三个参数:start,stop,check
start和stop:返回0成功,1 失败;
check :返回0表示GoldenGate在运行,1 表示不运行;
下面为示例程序 gg_action.scr 的内容:
#!/bin/sh
#set the Oracle Goldengate installation directory
export GGS_HOME=/ggate
#set the oracle home to the database to ensure GoldenGate will get the
#right environment settings to be able to connect to the database
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
#specify delay after start before checking for successful start
start_delay_secs=5
#Include the GoldenGate home in the library path to start GGSCI
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:${GGS_HOME}:${LD_LIBRARY_PATH}
#check_process validates that a manager process is running at the PID
#that GoldenGate specifies.
check_process () {
if ( [ -f "${GGS_HOME}/dirpcs/MGR.pcm" ] )
then
pid=`cut -f8 "${GGS_HOME}/dirpcs/MGR.pcm"`
if [ ${pid} = `ps -e |grep ${pid} |grep mgr |cut -d " " -f2` ]
then
#manager process is running on the PID exit success
exit 0
else
if [ ${pid} = `ps -e |grep ${pid} |grep mgr |cut -d " " -f1` ]
then
#manager process is running on the PID exit success
exit 0
else
#manager process is not running on the PID
exit 1
fi
fi
else
#manager is not running because there is no PID file
exit 1
fi
}
#call_ggsci is a generic routine that executes a ggsci command
call_ggsci () {
ggsci_command=$1
ggsci_output=`${GGS_HOME}/ggsci << EOF
${ggsci_command}
exit
EOF`
}
case $1 in
'start')
#start manager
call_ggsci 'start manager'
#there is a small delay between issuing the start manager command
#and the process being spawned on the OS. wait before checking
sleep ${start_delay_secs}
#check whether manager is running and exit accordingly
check_process
;;
'stop')
#attempt a clean stop for all non-manager processes
#call_ggsci 'stop er *'
#ensure everything is stopped
call_ggsci 'stop er *!'
#call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'check')
check_process
;;
'clean')
#attempt a clean stop for all non-manager processes
#call_ggsci 'stop er *'
#ensure everything is stopped
#call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'abort')
#ensure everything is stopped
call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
esac
添加一个应用程序profile
[oracle@node1 ggate]$ cd $ORA_CRS_HOME/bin
[oracle@node1 bin]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/bin
[oracle@node1 bin]$ crs_profile –create GG_app –t application
–r ggvip –a /ggate/gg_action.scr –o ci=10
其中:-r ggvip表示ggvip必须在GoldenGate启动之前运行,
-a /ggate/gg_action.scr 指定action 脚本的位置,在每个节点必须都可用
–o ci=10:检查的时间间隔设置为10
把这个资源注册到CRS:
[oracle@node1 bin]$ crs_register GG_app
修改GG_app的所有权,在root用户下执行:
[root@node1 bin]# ./crs_setperm GG_app –o oracle
为oracle用户分配启动这个资源的权限:
[root@node1 bin]# ./crs_setperm GG_app –u user:oracle:r-x
通过oracle用户启动这个资源:
[oracle@node1 bin]$ crs_start GG_app
Attempting to start `GG_app` on member `node1`
Start of `GG_app` on member `node1` succeeded.
查看资源状态显示如下:
[oracle@node1 bin]$ crs_stat GG_app -t
Name Type Target State Host
------------------------------------------------------------
GG_app application ONLINE ONLINE node1
测试节点迁移
在测试环境中可以使用crs_relocate –f GG_app使它强行漂移:过程显示如下:
[oracle@node1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
GG_app application ONLINE ONLINE node1
ggvip application ONLINE ONLINE node1
ora....AC1.srv application ONLINE ONLINE node1
ora....AC2.srv application ONLINE ONLINE node2
ora.RAC.RAC.cs application ONLINE ONLINE node2
ora....C1.inst application ONLINE ONLINE node1
ora....C2.inst application ONLINE ONLINE node2
ora.RAC.db application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
[oracle@node1 ~]$ crs_relocate -f GG_app
Attempting to stop `GG_app` on member `node1`
Stop of `GG_app` on member `node1` succeeded.
Attempting to stop `ggvip` on member `node1`
Stop of `ggvip` on member `node1` succeeded.
Attempting to start `ggvip` on member `node2`
Start of `ggvip` on member `node2` succeeded.
Attempting to start `GG_app` on member `node2`
Start of `GG_app` on member `node2` succeeded.
[oracle@node1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
GG_app application ONLINE ONLINE node2
ggvip application ONLINE ONLINE node2
ora....AC1.srv application ONLINE ONLINE node1
ora....AC2.srv application ONLINE ONLINE node2
ora.RAC.RAC.cs application ONLINE ONLINE node2
ora....C1.inst application ONLINE ONLINE node1
ora....C2.inst application ONLINE ONLINE node2
ora.RAC.db application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
可以看到GoldenGate成功转移到2节点运行了。
3 常见错误及解决方法
3.1 OGG-00446
启动源端抽取进程extnd, ggserr.log错误显示如下:
2012-08-17 11:11:38 ERROR OGG-00446 Oracle GoldenGate Capture for Oracle, extnd.prm: Could not find archived log for sequence 45835 thread 1 under default destinations SQL <SELECT name FROM v$archived_log WHERE sequence# = :ora_seq_no AND thread# = :ora_thread AND resetlogs_id = :ora_resetlog_id AND archived = 'YES' AND deleted = 'NO' AND name not like '+%' AND standby_dest = 'NO' >, error retrieving redo file name for sequence 45835, archived = 1, use_alternate = 0Not able to establish initial position for begin time 2012-08-15 17:28:28.
导致原因:早期归档日志被删除或已备份,导致找不到归档日志文件;
处理方法:
将备份的归档日志恢复到归档日志目录下,即可解决错误;
测试库可以指定抽取进程从某个时间点开始读取日志,跳过已删除的归档日志文件,命令如下:alter extract extnd,begin 2012-8-16 16:38;
3.2 OGG-01223
启动源端传输进程DPEND,ggserr.log错误显示如下:
2012-08-17 11:43:50 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, dpend.prm: TCP/IP error 79 (Connection refused).
2012-08-17 11:45:01 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, dpend.prm: TCP/IP error 79 (Connection refused).
导致原因:因为目标端110上MGR进程没有启动,导致报错
处理方法:
在目标端启动start mgr启动进程后,再启动源端的传输进程DPEND,错误消失,文件顺利传输过来了。
正常的日志如下:
2012-08-17 14:31:51 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, dpend.prm: EXTRACT DPEND started.
2012-08-17 14:33:13 INFO OGG-01226 Oracle GoldenGate Capture for Oracle, dpend.prm: Socket buffer size set to 27985 (flush size 27985).
2012-08-17 14:33:26 INFO OGG-01052 Oracle GoldenGate Capture for Oracle, dpend.prm: No recovery is required for target file F:oggdirdat d000000, at RBA 0 (file not opened).
2012-08-17 14:33:26 INFO OGG-01478 Oracle GoldenGate Capture for Oracle, dpend.prm: Output file F:oggdirdat d is using format RELEASE 11.2.
3.3 OGG-01224
启动源端传输进程DPEND,ggserr.log错误显示如下:
2012-08-22 05:33:10 ERROR OGG-01224 Oracle GoldenGate Capture for Oracle, dpend.prm: TCP/IP error 113 (No route to host).
2012-08-22 05:33:10 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, dpend.prm: PROCESS ABENDING.
导致原因:因为目标端235上的防火墙没有关闭,导致报错
处理方法:
在目标端机器关闭防火墙后,再启动源端的传输进程DPEND,错误消失,文件顺利传输过来了。
3.4 OGG-01031
启动源端传输进程DPEND,ggserr.log错误显示如下:
2012-08-28 15:09:39 ERROR OGG-01031 Oracle GoldenGate Capture for Oracle, dpend.prm: There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/uo1/app/ogg/dirdat/nd000004" (error 2, No such file or directory)).
2012-08-28 15:09:41 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, dpend.prm: PROCESS ABENDING.目标端ggserr.log错误显示如下:
2012-08-28 15:06:30 WARNING OGG-01223 Oracle GoldenGate Collector for Oracle: Unable to lock file "/uo1/app/ogg/dirdat/nd000004" (error 11, Resource temporarily unavailable). Lock currently held by process id (PID) 13854.
2012-08-28 15:06:30 WARNING OGG-01223 Oracle GoldenGate Collector for Oracle: Unable to open file "/uo1/app/ogg/dirdat/nd000004" (error 2, No such file or directory).
导致原因:可能是网络出现过故障,OGG源端的Data Pump进程与目标断了联系,目标端mgr为其启动的server进程一直还在运行,下次data pump重启时目标mgr会试图生成另外一个server进程,这样两个进程会争同一个队列文件。
处理方法:
1、停掉源端的所有data pump,使用ps –ef|grep server(或OGG安装目录)看看是不是还有OGG的server进程在跑,如果有,杀死它(一定要确认源端data pump全停掉,并且杀的是server进程,不要杀其它extract/replicat/mgr等),重启源端data pump即可。
2、可能是目标端的trail file出问题了,前滚重新生成一个新的队列文件
SEND EXTRACT xxx ETROLLOVER
或者:alter extract xxx etrollover
xxx为datapump的名称
3.5 OGG-01154
错误信息:2011-03-29 15:53:57 WARNING OGG-01154 Oracle GoldenGate Delivery for Oracle, repya.prm: SQL error 14402 mapping EPMA.D_METER to E
PMA.D_METER OCI Error ORA-14402: updating partition key column would cause a partition change (status = 14402), SQL <UPDATE "EPMA"."D_METER" SET "PR_ORG" = :a1,"BELONG_DEPT" = :a2 WHERE "METER_ID" = :b0>.
导致原因:源端更新了分区列,但目标端没有打开行移动,导致更新时报错;
处理方法:SQLPLUS>alter table SCHEMA.TABLENAME enable row movement;