zoukankan      html  css  js  c++  java
  • Oracle DG测试failover和后续恢复报告

    Oracle DG测试failover和后续恢复报告

    一、概述

    二、验证过程:

    三、结论

    一、概述

    本文是针对在DG灾备环境进行failover操作以及后续恢复的报告。 **我这里的测试环境是:** 数据库版本:Oracle 11.2.0.4 Site A:主库 db_unique_name=jyzhao Site B:备库(实时应用)db_unique_name=mynas Site C:备库(延迟1小时应用)db_unique_name=jyzhao_s

    以下章节涉及到的简称注释:
    A库 => Site A:主库
    B库 => Site B:备库(实时应用)
    C库 => Site C:备库(延迟1小时应用)

    验证:
    当A库crash后,在B库进行failover将B切换为新的主库,确认failover之后,A库和C库应该如何处理才可以成为新的备库继续使用?是否需要重建?重建的话,是否需要重新备份来恢复,以前的备份是否可以用来创建备库?

    二、验证过程:

    2.1 A库异常关闭

    A库:
    SQL> shutdown abort
    

    2.2 B库进行failover切换为新主库

    failover 标准步骤如下:
    #取消DG应用
    ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
    #重启下数据库(建议)
    shutdown immediate;
    startup
    #操作不可逆,确定实际情况需要failover
    ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH force;
    SELECT OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS FROM V$DATABASE; 
    #尝试常规切换为主库
    ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
    如果这一步的常规切换失败,提示需要介质恢复,那么:
     1)恢复备库 recover standby database until cancel;
     2)激活备库 alter database activate standby database;
    #最后重新启动数据库
    shutdown immediate;
    startup
    

    查看此时B库的信息:

    SQL> select name, database_role, open_mode from gv$database;
    
    NAME      DATABASE_ROLE    OPEN_MODE
    --------- ---------------- --------------------
    JYZHAO    PRIMARY          READ WRITE
    SQL> select * from v$log;
    
        GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIME   NEXT_CHANGE# NEXT_TIME
    ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------ ------------ ------------
             1          1          3   52428800        512          2 NO  CURRENT               17882720 03-SEP-17      2.8147E+14
             2          1          2   52428800        512          2 YES ACTIVE                17882488 03-SEP-17        17882720 03-SEP-17
             3          2          0   52428800        512          2 YES UNUSED                       0                         0
             4          2          0   52428800        512          2 YES UNUSED                       0                         0
    

    可以看到,目前B库已成为新的主库,redo日志的sequence重新开始。

    2.3 要求C库成为新主库的备库

    现在要求C库成为新主库的备库。是否需要重建C库呢?答案是不需要。下面具体来看下验证过程。 **C库的alert日志:**
    Sun Sep 03 14:09:58 2017
    Archived Log entry 9 added for thread 1 sequence 1227 ID 0x97764e10 dest 1:
    ARC2: Archive log thread 1 sequence 1227 available in 60 minute(s)
    Sun Sep 03 14:09:58 2017
    RFS[3]: Selected log 11 for thread 1 sequence 1228 dbid -1785877518 branch 919999037
    Sun Sep 03 14:19:24 2017
    RFS[3]: Possible network disconnect with primary database
    Sun Sep 03 14:19:24 2017
    RFS[2]: Possible network disconnect with primary database
    Sun Sep 03 14:19:24 2017
    RFS[4]: Assigned to RFS process 10190
    RFS[4]: Possible network disconnect with primary database
    

    可以看到,在A库crash之后,C库收到网络无法连接到A库的告警,说明C库目前没有新的操作。
    接下来想要C库成为B库(新主库)的备库,就需要尝试在B库上配置DG参数,使得B库的归档可以传输到C库。

    show parameter log_archive_config
    show parameter log_archive_dest_3
    alter system set log_archive_config = 'DG_CONFIG=(jyzhao,mynas,jyzhao_s)';
    alter system set log_archive_dest_3 = 'SERVICE=jyzhao_s LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=jyzhao_s';
    

    同时在B库的tnsnames.ora文件中增加到C库的连接:

    #Standby Single Instance
    JYZHAO_S =
      (DESCRIPTION =
        (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.111)(PORT = 1521))
        (CONNECT_DATA =
          (SERVER = DEDICATED)
          (SERVICE_NAME = jyzhao_s)
        )
      )
    

    在B库设置完成后,观察B库的告警:

    Sun Sep 03 14:37:41 2017
    ALTER SYSTEM SET log_archive_dest_3='SERVICE=jyzhao_s LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=jyzhao_s' SCOPE=BOTH;
    ARC3: Standby redo logfile selected for thread 1 sequence 3 for destination LOG_ARCHIVE_DEST_3
    Thread 1 cannot allocate new log, sequence 5
    Checkpoint not complete
      Current log# 2 seq# 4 mem# 0: +DATA/mynas/onlinelog/group_2.278.953484865
      Current log# 2 seq# 4 mem# 1: +FRA/mynas/onlinelog/group_2.714.953484869
    Thread 1 advanced to log sequence 5 (LGWR switch)
      Current log# 1 seq# 5 mem# 0: +DATA/mynas/onlinelog/group_1.277.953484853
      Current log# 1 seq# 5 mem# 1: +FRA/mynas/onlinelog/group_1.720.953484859
    ******************************************************************
    LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_3
    ******************************************************************
    LNS: Standby redo logfile selected for thread 1 sequence 5 for destination LOG_ARCHIVE_DEST_3
    Archived Log entry 382 added for thread 1 sequence 4 ID 0x978ff56d dest 1:
    Sun Sep 03 14:37:55 2017
    ARC3: Standby redo logfile selected for thread 1 sequence 4 for destination LOG_ARCHIVE_DEST_3
    Sun Sep 03 14:39:49 2017
    Thread 1 advanced to log sequence 6 (LGWR switch)
      Current log# 2 seq# 6 mem# 0: +DATA/mynas/onlinelog/group_2.278.953484865
      Current log# 2 seq# 6 mem# 1: +FRA/mynas/onlinelog/group_2.714.953484869
    Sun Sep 03 14:39:49 2017
    LNS: Standby redo logfile selected for thread 1 sequence 6 for destination LOG_ARCHIVE_DEST_3
    Sun Sep 03 14:39:49 2017
    Archived Log entry 388 added for thread 1 sequence 5 ID 0x978ff56d dest 1:
    

    然后返回C库操作,将C库开启实时日志应用:

    SQL> alter database recover managed standby database cancel;
    Database altered.
    
    SQL> alter database recover managed standby database using current logfile disconnect from session;
    Database altered.
    

    此时再观察C库的告警日志:

    Sun Sep 03 14:38:10 2017
    Clearing online log 11 of thread 1 sequence number 0
    Sun Sep 03 14:38:19 2017
    MRP0: Incarnation has changed! Retry recovery...
    RFS[7]: Selected log 11 for thread 1 sequence 4 dbid -1785877518 branch 953735009
    Errors in file /u01/app/oracle/diag/rdbms/jyzhao_s/jyzhao/trace/jyzhao_pr00_10161.trc:
    ORA-19906: recovery target incarnation changed during recovery
    Recovery interrupted!
    Sun Sep 03 14:38:19 2017
    Archived Log entry 11 added for thread 1 sequence 4 ID 0x978ff56d dest 1:
    Sun Sep 03 14:38:20 2017
     started logmerger process
    Sun Sep 03 14:38:20 2017
    Managed Standby Recovery not using Real Time Apply
    Parallel Media Recovery started with 2 slaves
    Media Recovery start incarnation depth : 1, target inc# : 3, irscn : 17882484
    Waiting for all non-current ORLs to be archived...
    All non-current ORLs have been archived.
    Media Recovery Delayed for 60 minute(s) (thread 1 sequence 1226)
    Sun Sep 03 14:38:31 2017
    RFS[8]: Assigned to RFS process 10704
    RFS[8]: Opened log for thread 1 sequence 1228 dbid -1785877518 branch 919999037
    Archived Log entry 12 added for thread 1 sequence 1228 rlc 919999037 ID 0x97764e10 dest 3:
    RFS[8]: Opened log for thread 1 sequence 1 dbid -1785877518 branch 953735009
    Sun Sep 03 14:38:36 2017
    RFS[7]: Opened log for thread 1 sequence 2 dbid -1785877518 branch 953735009
    Archived Log entry 13 added for thread 1 sequence 1 rlc 953735009 ID 0x978ff56d dest 3:
    Archived Log entry 14 added for thread 1 sequence 2 rlc 953735009 ID 0x978ff56d dest 3:
    Sun Sep 03 14:40:13 2017
    Archived Log entry 15 added for thread 1 sequence 5 ID 0x978ff56d dest 1:
    Sun Sep 03 14:40:13 2017
    RFS[6]: Selected log 11 for thread 1 sequence 6 dbid -1785877518 branch 953735009
    Sun Sep 03 14:40:37 2017
    alter database recover managed standby database cancel
    Sun Sep 03 14:40:38 2017
    MRP0: Background Media Recovery cancelled with status 16037
    Errors in file /u01/app/oracle/diag/rdbms/jyzhao_s/jyzhao/trace/jyzhao_pr00_10688.trc:
    ORA-16037: user requested cancel of managed recovery operation
    Recovery interrupted!
    Sun Sep 03 14:40:38 2017
    MRP0: Background Media Recovery process shutdown (jyzhao)
    Managed Standby Recovery Canceled (jyzhao)
    Completed: alter database recover managed standby database cancel
    Sun Sep 03 14:40:53 2017
    alter database recover managed standby database using current logfile disconnect from session
    Attempt to start background Managed Standby Recovery process (jyzhao)
    Sun Sep 03 14:40:53 2017
    MRP0 started with pid=20, OS id=10747 
    MRP0: Background Managed Standby Recovery process started (jyzhao)
     started logmerger process
    Sun Sep 03 14:40:58 2017
    Managed Standby Recovery starting Real Time Apply
    Parallel Media Recovery started with 2 slaves
    Media Recovery start incarnation depth : 1, target inc# : 3, irscn : 17882484
    Waiting for all non-current ORLs to be archived...
    All non-current ORLs have been archived.
    Managed Standby Recovery started with USING CURRENT LOGFILE
    Ignoring previously specified DELAY 60 minutes for thread 1 sequence 1226
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1226_dtq74r91_.arc
    Managed Standby Recovery started with USING CURRENT LOGFILE
    Ignoring previously specified DELAY 60 minutes for thread 1 sequence 1227
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1227_dtq75p6j_.arc
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1228_dtq8v7c7_.arc
    Identified End-Of-Redo (failover) for thread 1 sequence 1228 at SCN 0x0.110dd74
    Completed: alter database recover managed standby database using current logfile disconnect from session
    Resetting standby activation ID 2541112848 (0x97764e10)
    Media Recovery End-Of-Redo indicator encountered
    Media Recovery Continuing
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1_dtq8vdpr_.arc
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_2_dtq8vdpy_.arc
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_3_dtq8tgvl_.arc
    Sun Sep 03 14:41:11 2017
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_4_dtq8tvy5_.arc
    Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_5_dtq8yfr5_.arc
    Media Recovery Waiting for thread 1 sequence 6 (in transit)
    Recovery of Online Redo Log: Thread 1 Group 11 Seq 6 Reading mem 0
      Mem# 0: /u01/oradata/standbylog/standby_group_11.log
    

    实际看到,C库已经可以正常应用日志。说明C库不需要重建即可通过简单配置成为新主库B库的新备库。

    2.4 要求A库成为新主库的备库

    此时A库启动的话,是一个独立运行的数据库,如果想将A库也设置为主库的话,那么,通过新主库的最新备份肯定是可行的,但是如果数据量很大,之前A库自己本身有历史的备份,能否不再耗时备份新主库,直接通过历史的备份恢复呢?其实这个从上面的C库不再需要重建直接成为新主库的备库,也可以推断出,是可以的。只需要确认这个备份是在failover之前完成的。下面我们来具体实验验证下可行性。 在B库创建新的备库控制文件,并传输到A库相同路径下:
    backup current controlfile for standby format '/tmp/std_control02.ctl';
    

    在A库启动到nomount,恢复新的备库控制文件

    restore standby controlfile from '/tmp/std_control02.ctl';
    

    在A库查看数据文件头的检查点,确认是在failover之前:

    SYS@jyzhao1 >select name from v$datafile;
    
    NAME
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    +DATA1/jyzhao/datafile/system.258.951608183
    +DATA1/jyzhao/datafile/sysaux.257.951608183
    +DATA1/jyzhao/datafile/undotbs1.259.951608185
    +DATA1/jyzhao/datafile/users.265.951608205
    +DATA1/jyzhao/datafile/undotbs2.261.951608185
    +DATA1/jyzhao/datafile/dbs_d_jingyu.262.951608185
    +DATA1/jyzhao/datafile/dbs_i_jingyu.263.951608185
    +DATA1/jyzhao/datafile/test.264.951608185
    +DATA1/jyzhao/datafile/test2.260.951608185
    +DATA1/jyzhao/datafile/dbadata.276.952933931
    
    10 rows selected.
    
    SYS@jyzhao1 >select checkpoint_change# from v$database;
    
    CHECKPOINT_CHANGE#
    ------------------
              17892035
    
    SYS@jyzhao1 >select checkpoint_change# from v$datafile;
    
    CHECKPOINT_CHANGE#
    ------------------
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
    
    10 rows selected.
    
    SYS@jyzhao1 >select checkpoint_change# from v$datafile_header;
    
    CHECKPOINT_CHANGE#
    ------------------
                     0
                     0
                     0
                     0
                     0
                     0
                     0
                     0
                     0
                     0
    
    10 rows selected.
    

    上面这个数据文件头的检查点是0,说明数据文件没有正确获取到,实际上是由于OMF的名字有变化,直接将数据文件路径catalog到备份集中,再switch即可。

    catalog start with '+DATA1/jyzhao/datafile/';
    switch database to copy;
    

    再次查询:

    SYS@jyzhao1 >select current_scn||'' from v$database;
    
    CURRENT_SCN||''
    ----------------------------------------
    17892467
    
    SYS@jyzhao1 >select checkpoint_change# from v$database;
    
    CHECKPOINT_CHANGE#
    ------------------
              17892035
    
    SYS@jyzhao1 >select checkpoint_change# from v$datafile;
    
    CHECKPOINT_CHANGE#
    ------------------
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
              17892035
    
    10 rows selected.
    
    SYS@jyzhao1 >select checkpoint_change# from v$datafile_header;
    
    CHECKPOINT_CHANGE#
    ------------------
              17882484
              17882484
              17882484
              17882484
              17882484
              17882484
              17882484
              17882484
              17882484
              17882484
    
    10 rows selected.
    

    此时在mount状态下开启日志应用:

    alter database recover managed standby database disconnect from session;
    

    从告警日志观察,确认应用到最新时,取消日志应用:

    alter database recover managed standby database cancel;
    

    打开数据库,开启实时应用:

    alter database recover managed standby database USING CURRENT LOGFILE disconnect from session;
    

    最终查询可以正常实时应用。

    三、结论

    一般来说,在A库crash之后,B库failover成为新的主库,那么原来设置为延迟1小时应用的C 库是可以直接配置成为新主库的备库。A库修复后,也可以通过failover之前的现有备份集来恢复到failover之前的状态,而不需要在新主库重新去备份。
  • 相关阅读:
    spirngmvc整合mybatis
    C#微信支付
    centos mysql数据库主从同步
    centos 搭建ftp
    修改 Docker 默认网桥地址
    安装docker
    脚本自动化装centos6.5 python2.6升级2.7
    centos6.5 python2.6升级2.7
    weblogic 安装及发布web应用
    centos6.5安装pip方法
  • 原文地址:https://www.cnblogs.com/jyzhao/p/7472090.html
Copyright © 2011-2022 走看看