zoukankan      html  css  js  c++  java
  • Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby 处理方法


    一.背景说明

    前段时间一朋友在生产库上误操作,本来他是打算重启一下DG环境,结果在备库命令执行错误。

    本应该执行

    SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; 

    结果朋友执行成了如下命令:

    SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH; 

    中断了DG主备库的通信环境,这个finish是用来做Failover时用的。 当时让朋友在主库重新生成了一份standby controlfiles,然后copy到备库,在按正常模式启动就可以了。

       

        因为数据库识别主备库就是通过控制文件来的,所以理论上,只需要重新生成一份standby 控制文件就可以了。  后来朋友测试了一下,正常的拉起来了。

    今天看到了当时的记录,就顺便模拟一下整个操作,顺便练练手。

    二. 演示过程

    2.1 DG 环境说明

    OS: Oracle Linux6.3

    DB: 11.2.0.3

    SQL> select * from v$version;

    BANNER

    --------------------------------------------------------------------------------

    Oracle Database 11g Enterprise EditionRelease 11.2.0.3.0 - 64bit Production

    PL/SQL Release 11.2.0.3.0 - Production

    CORE   11.2.0.3.0      Production

    TNS for Linux: Version 11.2.0.3.0 -Production

    NLSRTL Version 11.2.0.3.0 - Production

    主库:

    SQL> select open_mode from v$database;

    OPEN_MODE

    --------------------

    READ WRITE

    SQL>

    SQL> set pagesize 200

    SQL> select sequence#,applied fromv$archived_log order by sequence# desc;

     SEQUENCE# APPLIED

    ---------- ---------

           14 YES

           14 NO

           13 YES

           13 NO

           12 NO

           12 YES

           11 YES

           11 NO

           10 NO

           10 YES

            9 YES

            9 NO

            8 NO

            8 YES

            7 YES

            7 NO

            6 YES

            6 NO

            5 NO

            4 NO

    20 rows selected.

    备库:

    SQL> select open_mode from v$database;

    OPEN_MODE

    --------------------

    MOUNTED

    SQL>

    SQL> select sequence#,applied fromv$archived_log order by sequence# desc;

     SEQUENCE# APPLIED

    ---------- ---------

           14 YES

           13 YES

           12 YES

           11 YES

           10 YES

            9 YES

             8 YES

            7 YES

            6 YES

    9 rows selected.

    2.2 模拟故障

    在备库执行如下命令:

    SQL> ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH;

    Database altered.

    2.3 查看主库 alert log

    [oracle@dg1 trace]$ pwd

    /u01/app/oracle/diag/rdbms/dave_pd/dave/trace

    [oracle@dg1 trace]$ tail -30 alert_dave.log

    Thread 1 advanced to log sequence 14 (LGWRswitch)

     Current log# 2 seq# 14 mem# 0: /u01/app/oracle/oradata/dave/redo02.log

    Fri Mar 29 03:30:12 2013

    Archived Log entry 17 added for thread 1sequence 13 ID 0x3312f7c4 dest 1:

    Fri Mar 29 03:30:13 2013

    LNS: Standby redo logfile selected forthread 1 sequence 14 for destination LOG_ARCHIVE_DEST_2

    Fri Mar 29 03:43:10 2013

    Time drift detected. Please check VKTMtrace file for more details.

    Fri Mar 29 04:45:31 2013

    Time drift detected. Please check VKTMtrace file for more details.

    Fri Mar 29 06:28:35 2013

    Time drift detected. Please check VKTMtrace file for more details.

    Fri Mar 29 07:08:14 2013

    Thread 1 advanced to log sequence 15 (LGWRswitch)

     Current log# 3 seq# 15 mem# 0: /u01/app/oracle/oradata/dave/redo03.log

    Fri Mar 29 07:08:16 2013

    Archived Log entry 20 added for thread 1sequence 14 ID 0x3312f7c4 dest 1:

    Fri Mar 29 07:08:17 2013

    LNS: Standby redo logfile selected forthread 1 sequence 15 for destination LOG_ARCHIVE_DEST_2

    Fri Mar 29 07:34:48 2013

    Time drift detected. Please check VKTMtrace file for more details.

    Fri Mar 29 07:48:55 2013

    LNS: Attempting destinationLOG_ARCHIVE_DEST_2 network reconnect (3135)

    LNS: Destination LOG_ARCHIVE_DEST_2 networkreconnect abandoned

    Error 3135 for archive log file 3 to'dave_st'

    Errors in file/u01/app/oracle/diag/rdbms/dave_pd/dave/trace/dave_nsa2_3181.trc:

    ORA-03135: connection lost contact

    LNS: Failed to archive log 3 thread 1sequence 15 (3135)

    Fri Mar 29 07:51:45 2013

    PING[ARC1]: Heartbeatfailed to connect to standby 'dave_st'. Error is 16143.

    因为我们在备库执行的Finish命令,导致心跳中断了。

    2.4 查看备库alert log

    [oracle@dg2 trace]$ pwd

    /u01/app/oracle/diag/rdbms/dave_st/dave/trace

    [oracle@dg2 trace]$  tail -20 alert_dave.log

    Terminal Recovery: thread 1 seq# 15 redorequired

    Terminal Recovery:

    Recovery of Online Redo Log: Thread 1 Group5 Seq 15 Reading mem 0

     Mem# 0: /u01/app/oracle/oradata/dave/stdbyredo02.log

    Identified End-Of-Redo (failover) forthread 1 sequence 15 at SCN 0xffff.ffffffff

    Incomplete Recovery applied until change1082890 time 03/29/2013 07:48:53

    MRP0: Media Recovery Complete (dave)

    Terminal Recovery: successful completion

    Fri Mar 29 07:48:49 2013

    ARCH: Archival stopped, error occurred.Will continue retrying

    ORACLE Instance dave - Archival Error

    Forcing ARSCN to IRSCN for TR 0:1082890

    Attempt to set limbo arscn 0:1082890 irscn0:1082890

    Resetting standby activation ID 856881092(0x3312f7c4)

    ORA-16014: log 5 sequence# 15 not archived,no available destinations

    ORA-00312: online log 5 thread 1:'/u01/app/oracle/oradata/dave/stdbyredo02.log'

    MRP0: Background Media Recovery processshutdown (dave)

    Fri Mar 29 07:48:50 2013

    Terminal Recovery: completion detected(dave)

    Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH

    [oracle@dg2 trace]$

    2.5 在主库切换归档

    SQL> alter system switch logfile;

    System altered.

    SQL> alter system switch logfile;

    System altered.

    SQL> select sequence#,applied fromv$archived_log order by sequence# desc;

     SEQUENCE# APPLIED

    ---------- ---------

           16 NO

           15 NO

           14 NO

           14 YES

           13 YES

           13 NO

           12 YES

           12 NO

           11 NO

           11 YES

           10 NO

           10 YES

            9 YES

            9 NO

            8 YES

            8 NO

             7 YES

            7 NO

            6 NO

            6 YES

            5 NO

            4 NO

    22 rows selected.

    SQL>

    2.6 再次查看主备库日志

    主库日志:

    Fri Mar 29 07:52:46 2013

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.

    Fri Mar 29 07:53:47 2013

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.

    Fri Mar 29 07:53:49 2013

    Thread 1 advanced to log sequence 16 (LGWRswitch)

     Current log# 1 seq# 16 mem# 0: /u01/app/oracle/oradata/dave/redo01.log

    Fri Mar 29 07:53:49 2013

    Archived Log entry 21 added for thread 1sequence 15 ID 0x3312f7c4 dest 1:

    Fri Mar 29 07:53:50 2013

    FAL[server, ARC2]: Error 16143 creatingremote archivelog file 'dave_st'

    FAL[server, ARC2]: FAL archive failed, seetrace file.

    ARCH: FAL archive failed. Archivercontinuing

    ORACLE Instance dave - Archival Error.Archiver continuing.

    Thread 1 advanced to log sequence 17 (LGWRswitch)

     Current log# 2 seq# 17 mem# 0: /u01/app/oracle/oradata/dave/redo02.log

    Fri Mar 29 07:53:57 2013

    Archived Log entry 22 added for thread 1sequence 16 ID 0x3312f7c4 dest 1:

    备库日志:

    Fri Mar 29 07:48:50 2013

    Terminal Recovery: completion detected(dave)

    Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH

    Fri Mar 29 07:51:34 2013

    RFS[5]: Assigned to RFS process 9336

    RFS[5]: No connections allowed during/afterterminal recovery.

    Fri Mar 29 07:52:35 2013

    RFS[6]: Assigned to RFS process 9340

    RFS[6]: No connections allowed during/afterterminal recovery.

    Fri Mar 29 07:53:36 2013

    RFS[7]: Assigned to RFS process 9343

    RFS[7]: No connections allowed during/afterterminal recovery.

    Fri Mar 29 07:53:39 2013

    RFS[8]: Assigned to RFS process 9345

    RFS[8]: No connectionsallowed during/after terminal recovery.

    2.7 在主库重建standby control file

    先在备库查看一下控制文件名称,等会创建完后直接覆盖过去:

    SQL> show parameter control

    NAME                                 TYPE        VALUE

    ----------------------------------------------- ------------------------------

    control_file_record_keep_time        integer     7

    control_files                        string      /u01/app/oracle/oradata/dave/c

                                                    ontrol01.ctl, /u01/app/oracle/

                                                    fast_recovery_area/dave/contro

                                                     l02.ctl

    control_management_pack_access       string      DIAGNOSTIC+TUNING

    主库创建standby controlfile

    SQL> alter database create standbycontrolfile as '/u01/control01.ctl';

    Database altered.

    copy到备库的目录,在覆盖原来的控制文件:

    --先关闭备库:

    SQL> shutdown immediate

    ORA-01109: database not open

    Database dismounted.

    ORACLE instance shut down.

    SQL>

    --copy并覆盖:

    [oracle@dg2 trace]$ cd/u01/app/oracle/oradata/dave/

    [oracle@dg2 dave]$ ls

    control01.ctl    stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf

    stdbyredo01.log  stdbyredo03.log  sysaux01.dbf     temp01.dbf    users01.dbf

    [oracle@dg2 dave]$ mv control01.ctlcontrol01.ctl.bak

    [oracle@dg2 dave]$ ls

    control01.ctl.bak  stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf

    stdbyredo01.log    stdbyredo03.log  sysaux01.dbf     temp01.dbf    users01.dbf

    [oracle@dg2 dave]$ scp192.168.1.20:/u01/control01.ctl 192.168.1.30:/u01/app/oracle/oradata/dave/

    The authenticity of host '192.168.1.20(192.168.1.20)' can't be established.

    RSA key fingerprint is0d:6a:5f:78:53:a0:bf:54:a8:e3:7e:67:81:06:8d:75.

    Are you sure you want to continueconnecting (yes/no)? yes

    Warning: Permanently added '192.168.1.20'(RSA) to the list of known hosts.

    oracle@192.168.1.20's password:

    oracle@192.168.1.30's password:

    control01.ctl                                                                        100% 9520KB 865.5KB/s   00:11   

    Connection to 192.168.1.20 closed.

    [oracle@dg2 dave]$ ls

    control01.ctl      stdbyredo01.log  stdbyredo03.log  sysaux01.dbf temp01.dbf     users01.dbf

    control01.ctl.bak  stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf

    [oracle@dg2 dave]$

    [oracle@dg2 archivelog]$ cd/u01/app/oracle/fast_recovery_area/dave/

    [oracle@dg2 dave]$ ls

    control02.ctl

    [oracle@dg2 dave]$ mv control02.ctlcontrol02.ctl.bak

    [oracle@dg2 dave]$ ls

    control02.ctl.bak

    [oracle@dg2 dave]$

    [oracle@dg2 dave]$ cp control01.ctl/u01/app/oracle/fast_recovery_area/dave/control02.ctl

    2.8 在正常拉起备库

    SQL> startup nomount;

    ORACLE instance started.

    Total System Global Area  814227456 bytes

    Fixed Size                  2232760 bytes

    Variable Size             478154312 bytes

    Database Buffers          331350016 bytes

    Redo Buffers                2490368 bytes

    SQL> alter database mount standby database;

    Database altered.

    SQL> alter database recover managedstandby database disconnect from session;

    Database altered.

    SQL>

    2.9 查看主备库日志

    主库日志:

    Fri Mar 29 08:00:51 2013

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.

    Fri Mar 29 08:01:52 2013

    Error 1034 received logging on to thestandby

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

    Fri Mar 29 08:02:56 2013

    Error 1034 received logging on to thestandby

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

    Fri Mar 29 08:03:57 2013

    Error 1034 received logging on to thestandby

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

    Fri Mar 29 08:04:59 2013

    Error 1034 received logging on to the standby

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

    Fri Mar 29 08:06:02 2013

    Error 1034 received logging on to thestandby

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

    Fri Mar 29 08:07:05 2013

    Error 1034 received logging on to thestandby

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

    Fri Mar 29 08:08:08 2013

    PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16058.

    Fri Mar 29 08:08:34 2013

    ALTER SYSTEM SETlog_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';

    Fri Mar 29 08:08:35 2013

    Thread 1 advanced to log sequence 18 (LGWRswitch)

     Current log# 3 seq# 18 mem# 0: /u01/app/oracle/oradata/dave/redo03.log

    Fri Mar 29 08:08:36 2013

    ******************************************************************

    LGWR: Setting 'active'archival for destination LOG_ARCHIVE_DEST_2

    ******************************************************************

    Fri Mar 29 08:08:36 2013

    Archived Log entry 23added for thread 1 sequence 17 ID 0x3312f7c4 dest 1:

    备库日志:

    [oracle@dg2 trace]$ tail -20 alert_dave.log

    ORA-27037: unable to obtain file status

    Linux-x86_64 Error: 2: No such file ordirectory

    Additional information: 3

    Clearing online redo logfile 3 complete

    Media Recovery Waiting for thread 1sequence 15

    Fetching gap sequence in thread 1, gapsequence 15-16

    Fri Mar 29 08:08:48 2013

    RFS[3]: Assigned to RFS process 9707

    RFS[3]: Opened log for thread 1 sequence 16dbid 856896964 branch 794014730

    Fri Mar 29 08:08:49 2013

    RFS[4]: Assigned to RFS process 9705

    RFS[4]: Opened log for thread 1 sequence 15dbid 856896964 branch 794014730

    Archived Log entry 2 added for thread 1sequence 16 rlc 794014730 ID 0x3312f7c4 dest 2:

    Archived Log entry 3 added for thread 1sequence 15 rlc 794014730 ID 0x3312f7c4 dest 2:

    Fri Mar 29 08:08:55 2013

    Media Recovery Log/u01/archivelog/1_15_794014730.dbf

    Media Recovery Log/u01/archivelog/1_16_794014730.dbf

    Media Recovery Log/u01/archivelog/1_17_794014730.dbf

    Fri Mar 29 08:09:11 2013

    Media Recovery Waitingfor thread 1 sequence 18 (in transit)

    注意这里:

        我们把备库拉起来之后,就自动开始同步了。

    2.10 切换归档测试

    主库:

    SQL> alter system switch logfile;

    System altered.

    SQL> select sequence#,applied from v$archived_log order by sequence# desc;

     SEQUENCE# APPLIED

    ---------- ---------

           18 NO

           18 NO

           17 NO

           17 YES

           16 YES

           16 NO

           15 NO

           15 YES

           14 NO

           14 YES

           13 YES

           13 NO

           12 NO

           12 YES

           11 NO

           11 YES

           10 YES

           10 NO

            9 NO

            9 YES

            8 NO

            8 YES

            7 NO

            7 YES

            6 NO

            6 YES

            5 NO

            4 NO

    28 rows selected.

    备库:

    SQL> select sequence#,applied from v$archived_log order by sequence# desc;

     SEQUENCE# APPLIED

    ---------- ---------

           18 YES

           17 YES

           16 YES

           15 YES

    注意这里,备库已经完全同步了。 之前在我们主库看,18的日志还没有应用,因为我们刚启动备库,应用需要一定的时间。 并且在我们重新配置之后,这里的数字就从15开始了。 是我们中断DG后的数字。 但我们主库还是从4开始计算的。

       

    小结:

        对于DG通信异常中断的处理,我们仅仅需要重新创建一份standby 的control file就可以了。

    ---------------------------------------------------------------------------------------

    版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!

    Skype:    tianlesoftware

    QQ:       tianlesoftware@gmail.com

    Email:    tianlesoftware@gmail.com

    Blog:     http://blog.csdn.net/tianlesoftware

    Weibo:    http://weibo.com/tianlesoftware

    Twitter:  http://twitter.com/tianlesoftware

    Facebook: http://www.facebook.com/tianlesoftware

    Linkedin: http://cn.linkedin.com/in/tianlesoftware

    道森Oracle,国内最早、最大的网络语音培训机构,我们提供专业、优质的Oracle技术培训和服务! 我们的官方网站:http://www.daosenoracle.com 官方淘宝店:http://daosenpx.taobao.com/
  • 相关阅读:
    C语言编程的两个工具:valgrind和core
    C语言动态库和静态库的使用及实践
    编译安装pgbouncer-checking for OpenSSL... configure: error: not found
    automake使用
    make笔记
    GCC命令
    gcc中关于静态库和动态库使用(转)
    zookeeper的c API 单线程与多线程问题 cli_st和cli_mt
    zookeeper数据一致性与paxos算法
    Deepgreen DB简介(转)
  • 原文地址:https://www.cnblogs.com/tianlesoftware/p/3609080.html
Copyright © 2011-2022 走看看