zoukankan      html  css  js  c++  java
  • 利用XAG在RAC环境下实现GoldenGate自动Failover

    概述

    在RAC环境下配置OGG,要想实现RAC节点故障时,OGG能自动的failover到正常节点,要保证两点:

    1. OGG的checkpoint,trail,BR文件放置在共享的集群文件系统上,RAC各节点都能访问到

    2. 需要有集群软件的来监测OGG进程,以及发生故障时,自动在正常节点重启OGG(failover)

    Oracle Grid Infrastructure Standalone Agents (XAG)搭配Oracle支持的集群文件系统,可以实现OGG的自动failover,本文介绍相关的配置步骤。

    组件及版本要求

    要想使用XAG实现自动failover,相关软件的版本必须满足要求:

    clip_image002

    至于集群文件系统,Oracle官方文档给出的建议是ACFS,DBFS和OCFS,我觉得其他集群文件系统,比如Veritas 的集群文件系统应该也可以。

    本文示例使用的是ACFS。

    测试环境软件版本

    源端数据库:11.2.0.4 RAC (ASM)

    目标端数据库:12.1.0.2 RAC(ASM)

    GoldenGate : 12.2.0.1.1

    操作系统:源端和目标端都是Oracle Enterprise Linux 6.5 (64bit)

    配置步骤

    安装GI XAG

    XAG需要单独去Oracle官网下载安装 ,下载位置是:http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html

    目前的版本是7,文件是xagpack_7b.zip

    解压缩文件,然后用GI的安装用户(一般是“grid”),执行xagsetup.sh进行安装:

    [grid@rac1 xag]$ ./xagsetup.sh --install --directory /u01/app/grid/xaghome --all_nodes

    Installing Oracle Grid Infrastructure Agents on: rac1

    Installing Oracle Grid Infrastructure Agents on: rac2

    Done.

    在目标端也安装XAG,方法和源端相同。

    源端(11.2)创建ACFS

    11.2.0.4 在OEL上如果想用ACFS,必须安装PSU补丁到11.2.0.4.4以上。补丁过程略过。

    使用ACFS的磁盘组的属性值COMPATIBLE.ASM和COMPATIBLE.ADVM必须设置为11.2 :

    clip_image004

    使用ASMCMD或ASMCA创建ACFS卷:

    clip_image006

    clip_image008

    创建通用ACFS

    clip_image010

    clip_image011

    此时ACFS还不是CRS管理的,可以使用ASMCMD的volinfo命令或/sbin/acfsutil registry查看ACFS信息

    ASMCMD> volinfo -a

    Diskgroup Name: DATA

    Volume Name: VOLOGG1

    Volume Device: /dev/asm/vologg1-426

    State: ENABLED

    Size (MB): 3072

    Resize Unit (MB): 32

    Redundancy: UNPROT

    Stripe Columns: 4

    Stripe Width (K): 128

    Usage: ACFS

    Mountpath: /u01/app/grid/acfsmounts/data_vol1

    [root@rac1 ~]# /sbin/acfsutil registry

    Mount Object:

    Device: /dev/asm/vologg1-426

    Mount Point: /u01/app/grid/acfsmounts/data_vol1

    Disk Group: DATA

    Volume: VOLOGG1

    Options: none

    Nodes: all

    源端(11.2)将ACFS注册到CRS

    首先从通用ACFS的注册信息中删除我们刚才创建的ACFS的条目

    [root@rac1 ~]# /sbin/acfsutil registry -d /u01/app/grid/acfsmounts/data_vol1

    acfsutil registry: successfully removed ACFS mount point /u01/app/grid/acfsmounts/data_vol1 from Oracle Registry

    然后,用SRVCTL工具进行CRS资源注册:

    [root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl add filesystem -d /dev/asm/vologg1-426 -v VOLOGG1 -g DATA -m /u01/app/grid/acfsmounts/data_vol1 -u grid

    [root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t

    --------------------------------------------------------------------------------

    NAME TARGET STATE SERVER STATE_DETAILS

    --------------------------------------------------------------------------------

    Local Resources

    --------------------------------------------------------------------------------

    ora.DATA.dg

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.LISTENER.lsnr

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.asm

    ONLINE ONLINE rac1 Started

    ONLINE ONLINE rac2 Started

    ora.data.vologg1.acfs

    OFFLINE OFFLINE rac1

    OFFLINE OFFLINE rac2

    ora.gsd

    OFFLINE OFFLINE rac1

    OFFLINE OFFLINE rac2

    ora.net1.network

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.ons

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    --------------------------------------------------------------------------------

    手工启动资源,(mount ACFS)

    [root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl start filesystem -d /dev/asm/vologg1-426

    [root@rac1 ~]#

    [root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t

    --------------------------------------------------------------------------------

    NAME TARGET STATE SERVER STATE_DETAILS

    --------------------------------------------------------------------------------

    Local Resources

    --------------------------------------------------------------------------------

    ora.DATA.dg

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.LISTENER.lsnr

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.asm

    ONLINE ONLINE rac1 Started

    ONLINE ONLINE rac2 Started

    ora.data.vologg1.acfs

    ONLINE ONLINE rac1 mounted on /u01/app /grid/acfsmounts/dat a_vol1

    ONLINE ONLINE rac2 mounted on /u01/app/grid/acfsmounts/dat a_vol1

    [root@rac1 ~]# df -h

    Filesystem Size Used Avail Use% Mounted on

    /dev/mapper/vg_rac1-lv_root 45G 32G 12G 74% /

    tmpfs 2.0G 437M 1.6G 23% /dev/shm

    /dev/sda1 477M 55M 397M 13% /boot

    /dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1

    [root@rac2 ~]# df -h

    Filesystem Size Used Avail Use% Mounted on

    /dev/mapper/vg_rac1-lv_root 45G 25G 19G 58% /

    tmpfs 2.0G 440M 1.6G 23% /dev/shm

    /dev/sda1 477M 55M 397M 13% /boot

    /dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1

    目标端(12.1)创建ACFS及注册

    12c创建ACFS和11g的主要区别是,没有了通用和数据库home用文件系统的选项,创建后会生成注册文件系统到CRS的脚本。

    clip_image013

    clip_image015

    运行系统生成的脚本,完成注册及挂载:

    [root@oel65vm11 scripts]# ./acfs_script.sh

    ACFS file system /u01/app/grid/acfsmounts/ogg_vol1 is mounted on nodes oel65vm11,oel65vm12

    查看资源信息:

    [root@oel65vm11 bin]# ./crsctl status resource -t

    --------------------------------------------------------------------------------

    Name Target State Server State details

    --------------------------------------------------------------------------------

    Local Resources

    --------------------------------------------------------------------------------

    ora.DATA.VOLOGG2.advm

    ONLINE ONLINE oel65vm11 STABLE

    ONLINE ONLINE oel65vm12 STABLE

    ora.DATA.dg

    ONLINE ONLINE oel65vm11 STABLE

    ONLINE ONLINE oel65vm12 STABLE

    ora.LISTENER.lsnr

    ONLINE ONLINE oel65vm11 STABLE

    ONLINE ONLINE oel65vm12 STABLE

    ora.asm

    ONLINE ONLINE oel65vm11 Started,STABLE

    ONLINE ONLINE oel65vm12 Started,STABLE

    ora.data.vologg2.acfs

    ONLINE ONLINE oel65vm11 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE

    ONLINE ONLINE oel65vm12 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE

    ora.net1.network

    ONLINE ONLINE oel65vm11 STABLE

    ONLINE ONLINE oel65vm12 STABLE

    ora.ons

    ONLINE ONLINE oel65vm11 STABLE

    ONLINE ONLINE oel65vm12 STABLE

    注意,所有节点必须关掉SELINUX,否则会出现ACFS无权写入的错误。

    安装Oracle GoldenGate

    这个版本的ogg同时支持11g和12c的数据库,在图形界面安装时,用户可以选择对应不同数据库版本的ogg

    clip_image017

    将OGG安装在前面创建的ACFS上:

    clip_image019

    源端的安装位置:/u01/app/grid/acfsmounts/data_vol1/ogg12

    目标端的安装位置:/u01/app/grid/acfsmounts/ogg_vol1/ogg12

    选择自动启动Manager进程。

     

    数据库准备工作

    l 变更源端数据库为归档模式,过程省略。

    l 源端数据库增加相关日志及修改参数:

    SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

    Database altered.

    SQL> ALTER DATABASE FORCE LOGGING;

    Database altered.

    SQL> SELECT supplemental_log_data_min, force_logging FROM v$database;

    SUPPLEME FORCE_LOGGING

    -------- ---------------------------------------

    YES YES

    SQL> ALTER SYSTEM SWITCH LOGFILE;

    System altered.

    SQL> alter system set ENABLE_GOLDENGATE_REPLICATION=true;

    System altered.

    l 在源端和目标端创建OGG数据库用户及授权,我的例子里创建的用户是GGADM。

    OGG用户需要的权限可以参阅联机文档《Installing and Configuring Oracle GoldenGate for Oracle Database 12c (12.2.0.1)》中的章节 4.1.4.1 Oracle 11.2.0.4 or Later Database Privileges,我们这个测试为了方便,授予用户DBA角色,以及使用特定系统包的授权:

    SQL> BEGIN

    dbms_goldengate_auth.grant_admin_privilege

    2 3 (

    grantee => 'GGADM',

    privilege_type => 'CAPTURE',

    grant_select_privileges => TRUE

    );

    END;

    / 4 5 6 7 8 9

    PL/SQL procedure successfully completed.

    源端OGG设置

    l 登录数据库:

    GGSCI (rac1.hthorizontest.com) 1> dblogin userid ggadm password ggadm

    Successfully logged into database.

    l 注册集成式抽取

    GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 2> register extract ext1 database;

    2016-04-07 23:44:38 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1291634.

    l 增加抽取进程

    GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 3> ADD EXTRACT ext1 INTEGRATED TRANLOG, BEGIN NOW

    EXTRACT (Integrated) added.

    GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 4> ADD EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et, EXTRACT ext1

    EXTTRAIL added.

    l 增加传送进程

    GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 5> ADD EXTRACT pump1 EXTTRAILSOURCE /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et

    EXTRACT added.

    GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 6>EDIT PARAMS EXT1

    加入下面内容:

    EXTRACT ext1

    USERID ggadm, PASSWORD ggadm

    TRANLOGOPTIONS INTEGRATED PARAMS (MAX_SGA_SIZE 100)

    EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et

    TABLE test.*;

    GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 7>EDIT PARAMS PUMP1

    加入下面内容:

    EXTRACT pump1

    USERID ggadm, PASSWORD ggadm

    RMTHOST 192.168.0.11, MGRPORT 7809

    RMTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt

    TABLE TEST.*;

    然后启动所有进程。

    在11.2.0.4版本,如果实现集成的capture模式,在启动抽取进程时,会提示需要安装补丁17030189,主要是因为使用集成的capture,需要修改数据字典表。

    但是在安装了PSU后,有时会导致这个补丁和其他补丁冲突,也可以手工执行prvtlmpg.plb来解决问题。

    (EXTRACT Abending With OGG-02912 (Doc ID 2091679.1))

    目标端OGG设置

    GGSCI (oel65vm11.hthorizon.com) 8> dblogin userid ggadm password ggadm

    Successfully logged into database.

    GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 9>ADD CHECKPOINTTABLE ggadm.checkpointtab

    Successfully created checkpoint table ggadm.checkpointtab

    GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 10> ADD REPLICAT rep1, EXTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt checkpointtable ggadm.checkpointtab

    REPLICAT added.

    GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 11>EDIT PARAMS REP1

    加入下面内容:

    REPLICAT rep1

    USERID ggadm, PASSWORD ggadm

    ASSUMETARGETDEFS

    DISCARDFILE /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt, PURGE

    MAP TEST.* TARGET TEST.*;

    然后启动进程,测试OGG数据复制是否正常

    修改OGG MGR参数

    为了让OGG的Manager进程能够自动启动复制进程,需要将下列配置加进Manager的配置文件:

    AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60

    AUTOSTART ER *

    重启Manager进程使之生效。

    源端和目标端都要修改。

    配置源端XAG

    l 添加APP VIP(以root身份)

    [root@rac1 ~]# /u01/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.36 -vipname=xag.gg_1-vip.vip -user=oracle

    l 允许grid用户启动资源(以root身份)

    [root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x

    l 启动VIP(以grid身份)

    [root@rac1 ~]# su - grid

    [grid@rac1 ~]$ /u01/app/11.2.0/grid/bin/crsctl start resource xag.gg_1-vip.vip

    CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'rac1'

    CRS-2676: Start of 'xag.gg_1-vip.vip' on 'rac1' succeeded

    l 查看状态

    [grid@rac1 ~]$ crsctl status resource xag.gg_1-vip.vip

    NAME=xag.gg_1-vip.vip

    TYPE=app.appvip_net1.type

    TARGET=ONLINE

    STATE=ONLINE on rac1

    l 创建OGG对应的CRS资源(以root身份)

    [root@rac1 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_1 --gg_home /u01/app/grid/acfsmounts/data_vol1/ogg12 --instance_type source --nodes rac1,rac2 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg1.acfs --databases ora.tdb.db --oracle_home /u01/app/oracle/product/11.2.0/dbhome_1 --monitor_extracts ext1,pump1

    [root@rac1 ~]# cd /u01/app/grid/xaghome/bin

    [root@rac1 bin]# ./agctl status goldengate gg_1

    Goldengate instance 'gg_1' is not running

    l 授权grid启动资源

    上面的命令执行完毕,会自动创建一个对应ogg的CRS资源,需要授权grid有权管理它:

    [root@oel65vm11 bin]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1.goldengate -u user:grid:r-x

    配置目标端XAG

    过程和源端类似,

    l 创建VIP资源:

    [root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.26 -vipname=xag.gg_1-vip.vip -user=oracle

    [root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x

    [root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl start resource xag.gg_1-vip.vip

    CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm12'

    CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded

    [root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl relocate resource xag.gg_1-vip.vip -n oel65vm11

    CRS-2673: Attempting to stop 'xag.gg_1-vip.vip' on 'oel65vm12'

    CRS-2677: Stop of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded

    CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm11'

    CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm11' succeeded

    l 创建ogg 对应的CRS资源

    [root@oel65vm11 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_2 --gg_home /u01/app/grid/acfsmounts/ogg_vol1/ogg12 --instance_type target --nodes oel65vm11,oel65vm12 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg2.acfs --databases ora.racdb.db --oracle_home /u01/app/oracle/product/12.1.0/dbhome_1 --monitor_replicats rep1

    l 授权

    [root@oel65vm11 bin]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_2.goldengate -u user:grid:r-x

    修改PUMP进程

    将PUMP进程对应的源端地址修改为我们刚才创建的VIP

    RMTHOST 192.168.0.26, MGRPORT 7809

    重启PUMP进程

    启动CRS OGG资源

    进入ggsci命令行,将源端和目标段进程都停掉

    l 启动目标端资源

    [grid@oel65vm11 ~]$ cd $ORACLE_BASE

    [grid@oel65vm11 grid]$ cd xaghome/bin

    [grid@oel65vm11 bin]$ ./agctl start goldengate gg_2 --node oel65vm11

    [grid@oel65vm11 bin]$ crsctl status resource xag.gg_2.goldengate

    NAME=xag.gg_2.goldengate

    TYPE=xag.goldengate.type

    TARGET=ONLINE

    STATE=ONLINE on oel65vm11

    l 启动源端资源

    [grid@rac1 bin]$ cd $ORACLE_BASE

    [grid@rac1 grid]$ cd xaghome/bin

    [grid@rac1 bin]$ ./agctl start goldengate gg_1 --node rac1

    [grid@rac1 bin]$ crsctl status resource xag.gg_1.goldengate

    NAME=xag.gg_1.goldengate

    TYPE=xag.goldengate.type

    TARGET=ONLINE

    STATE=ONLINE on rac1

    启动后,进入GGSCI命令行,查看进程状态,如果进程都自动启动了,说明配置没有问题。

    切换测试

    使用命令测试源端切换:

    [grid@rac1 bin]$ ./agctl relocate goldengate gg_1 --node rac2

    [grid@rac1 bin]$ crsctl status resource –t

    。。。。。。

    --------------------------------------------------------------------------------

    Cluster Resources

    --------------------------------------------------------------------------------

    。。。。。。

    xag.gg_1-vip.vip

    1 ONLINE ONLINE rac2

    xag.gg_1.goldengate

    1 ONLINE ONLINE rac2

    再做一个切断电源的测试,我们以“关掉电源”的方式关闭目标端的主机oel65vm11

    在主机oel65vm12上,可以看到RAC的vip failover到了本节点,ogg的vip和gg_2对应的资源也自动failover到了本节点:

    [grid@oel65vm12 ~]$ crsctl status resource -t

    。。。。。。

    --------------------------------------------------------------------------------

    Cluster Resources

    --------------------------------------------------------------------------------

    。。。。。。

    ora.oel65vm11.vip

    1 ONLINE INTERMEDIATE oel65vm12 FAILED OVER,STABLE

    ora.oel65vm12.vip

    1 ONLINE ONLINE oel65vm12 STABLE

    ora.racdb.db

    1 ONLINE OFFLINE STABLE

    2 ONLINE ONLINE oel65vm12 Open,STABLE

    ora.scan1.vip

    1 ONLINE ONLINE oel65vm12 STABLE

    xag.gg_1-vip.vip

    1 ONLINE ONLINE oel65vm12 STABLE

    xag.gg_2.goldengate

    1 ONLINE ONLINE oel65vm12 STABLE

    上面只是一个最简单的例子,没有考虑各种复杂的情况,例如,同时部署有监控jagent,或者downstream复制等等,所以现实的生产环境往往比这个例子复杂得多。

  • 相关阅读:
    碰撞检测 :Polygon
    碰撞检测 :Line
    碰撞检测 :Rectangle
    碰撞检测:Point
    Canvas 绘制 1 px 直线模糊(非高清屏)的问题
    threading之线程的开始,暂停和退出
    win10利用hexo+gitee搭建博客
    Fullscreen API与DOM监听API
    <el-input>只能输入数字,保留两位小数
    谷歌浏览器查看gitee和github代码的插件
  • 原文地址:https://www.cnblogs.com/raobing/p/6174543.html
Copyright © 2011-2022 走看看