在《GoldenGate实现Live Standby主备库切换(1)》中我们介绍了如何针对GoldenGate Live standby环境执行计划内的Switchover切换。除去计划内的主备切换,实际生产中更多的故障切换发生在主机故障或主库不可用的情况下,这种情况下一般我们已经无法在Primary上停止应用及extract了;当我们在这样的情况下failover到Standby上后如同在DataGuard环境下一样即便Primary上的数据库恢复了我们也无法直接进行回切了,需要做的是重新配置Primary上的OGG并将Standby上的数据以initial load的形式还原回去,在数据重新同步后才能再切换到Primary上。下面我们就来介绍如何在计划外的情况下从主库failover到备库,并尝试回切:
1.
使用lag replicat命令了解standby上的replicat的延迟情况,若返回"At EOF (end of file)"则说明replicat已应用所有trail中的数据到备库上。
GGSCI (rh3.oracle.com) 1> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXTSTD2 00:00:00 23:42:47
EXTRACT STOPPED PUMPSTD2 00:00:00 23:41:29
REPLICAT RUNNING REPSTD1 00:00:00 00:00:00
GGSCI (rh3.oracle.com) 5> lag replicat repstd1
Sending GETLAG request to REPLICAT REPSTD1 ...
Last record lag: 5 seconds.
At EOF, no more records to process.
2.
停止standby上的replicat
GGSCI (rh3.oracle.com) 6> stop replicat repstd1
Sending STOP request to REPLICAT REPSTD1 ...
Request processed.
3.
在standby上执行必要的赋予DML权限,启动triggers触发器和cascade delete约束的脚本
4.
启动standby上的extract,
在此之前先确认Standby上的data pump group不被启动,以保证trail文件堆积在standby上
GGSCI (rh3.oracle.com) 15> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXTSTD2 00:00:00 24:04:16
EXTRACT STOPPED PUMPSTD2 00:00:00 24:02:57
REPLICAT STOPPED REPSTD1 00:00:00 00:00:06
GGSCI (rh3.oracle.com) 16> start extstd2
Sending START request to MANAGER ...
EXTRACT EXTSTD2 starting
5.
此时可以将应用切换到standby上了
==============================================================================
以上步骤完成了故障切换到Standby的过程,接下来我们尝试将应用还原到primary上
1.如果原primary主机已损毁则需要重装Oracle软件,并重建Primary系统上的Goldengate软件目录
2.从primary端启动GGSCI命令
3.删除primary上相关的extract及EXTTRAIL,并重建
GGSCI (rh2.oracle.com) 6> delete extract extstd1
Deleted EXTRACT EXTSTD1.
GGSCI (rh2.oracle.com) 7> delete exttrail /d01/ext/cl
GGSCI (rh2.oracle.com) 14> add extract extstd1,tranlog,begin now
EXTRACT added.
GGSCI (rh2.oracle.com) 15> add exttrail /d01/ext/cl,megabytes 100,extract extstd1
EXTTRAIL added.
4.
在primary上启动Manager
GGSCI (rh2.oracle.com) 18> start Manager
Manager started.
5.
接着在primary上执行disable trigger触发器和cascade delete约束的脚本
6.
在standby上对执行热备份(逻辑,物理的均可);并记录该热备的结束时间
7.
使用standby上的热备份来完成primary上的initial load后,再以HANDLECOLLISIONS选项启动Standby上的replicat
GGSCI (rh2.oracle.com) 22> view params repstd2
-- Identify the Replicat group:
REPLICAT repstd2
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
HANDLECOLLISIONS
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE
GGSCI (rh2.oracle.com) 23> start replicat repstd2
Sending START request to MANAGER ...
REPLICAT REPSTD2 starting
8.并启动standby上的data pump group,将堆积的trail文件传输到Primary上
GGSCI (rh3.oracle.com) 19> start pumpstd2
Sending START request to MANAGER ...
EXTRACT PUMPSTD2 starting
9.使用info replicat观察primary上的replicat,观察其进度是否已晚于完成初始化导出的时间
10.
禁用primary上目前使用的HANDLECOLLISIONS选项
GGSCI (rh2.oracle.com) 26> send replicat repstd2,NOHANDLECOLLISIONS
11.
关闭之前切换到Standby上的一切应用
12.
12.若需要进行数据验证则关闭Standby上的extract、pump及Primary上的replicat:
GGSCI (rh2.oracle.com) 31> lag replicat repstd2
Sending GETLAG request to REPLICAT REPSTD2 ...
Last record lag: 3 seconds.
At EOF, no more records to process.
GGSCI (rh3.oracle.com) 28> stop extstd2
Sending STOP request to EXTRACT EXTSTD2 ...
Request processed.
GGSCI (rh3.oracle.com) 26> stop pumpstd2
Sending STOP request to EXTRACT PUMPSTD2 ...
Request processed.
GGSCI (rh2.oracle.com) 34> stop replicat repstd2
Sending STOP request to REPLICAT REPSTD2 ...
Request processed.
/* 使用Oracle GoldenGate Veridata等工具验证数据一致性,
若不一致则修复
*/
standby库上:
SQL> select sum(t2) from tv;
SUM(T2)
----------
5355944997
primary库上:
SQL> select sum(t2) from tv;
SUM(T2)
----------
5355944997
13.在primary系统上赋予应用相关DML权限,启用触发器及删除约束
14.
修改primary系统上的extract group的begin time为当前,启动Primary到Standby的extract、pump及replicat
GGSCI (rh2.oracle.com) 36> alter extstd1 ,begin now
EXTRACT altered.
GGSCI (rh2.oracle.com) 52> start extract extstd1
Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting
GGSCI (rh2.oracle.com) 53> start extract pumpstd1
Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting
GGSCI (rh3.oracle.com) 3> start repstd1
Sending START request to MANAGER ...
REPLICAT REPSTD1 starting
此时系统切换回原始的primary->standby状态.
That's great!