一.背景说明
前段时间一朋友在生产库上误操作,本来他是打算重启一下DG环境,结果在备库命令执行错误。
本应该执行
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
结果朋友执行成了如下命令:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
中断了DG主备库的通信环境,这个finish是用来做Failover时用的。 当时让朋友在主库重新生成了一份standby controlfiles,然后copy到备库,在按正常模式启动就可以了。
因为数据库识别主备库就是通过控制文件来的,所以理论上,只需要重新生成一份standby 控制文件就可以了。 后来朋友测试了一下,正常的拉起来了。
今天看到了当时的记录,就顺便模拟一下整个操作,顺便练练手。
二. 演示过程
2.1 DG 环境说明
OS: Oracle Linux6.3
DB: 11.2.0.3
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise EditionRelease 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 -Production
NLSRTL Version 11.2.0.3.0 - Production
主库:
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE
SQL>
SQL> set pagesize 200
SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
SEQUENCE# APPLIED
---------- ---------
14 YES
14 NO
13 YES
13 NO
12 NO
12 YES
11 YES
11 NO
10 NO
10 YES
9 YES
9 NO
8 NO
8 YES
7 YES
7 NO
6 YES
6 NO
5 NO
4 NO
20 rows selected.
备库:
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
MOUNTED
SQL>
SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
SEQUENCE# APPLIED
---------- ---------
14 YES
13 YES
12 YES
11 YES
10 YES
9 YES
8 YES
7 YES
6 YES
9 rows selected.
2.2 模拟故障
在备库执行如下命令:
SQL> ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH;
Database altered.
2.3 查看主库 alert log
[oracle@dg1 trace]$ pwd
/u01/app/oracle/diag/rdbms/dave_pd/dave/trace
[oracle@dg1 trace]$ tail -30 alert_dave.log
Thread 1 advanced to log sequence 14 (LGWRswitch)
Current log# 2 seq# 14 mem# 0: /u01/app/oracle/oradata/dave/redo02.log
Fri Mar 29 03:30:12 2013
Archived Log entry 17 added for thread 1sequence 13 ID 0x3312f7c4 dest 1:
Fri Mar 29 03:30:13 2013
LNS: Standby redo logfile selected forthread 1 sequence 14 for destination LOG_ARCHIVE_DEST_2
Fri Mar 29 03:43:10 2013
Time drift detected. Please check VKTMtrace file for more details.
Fri Mar 29 04:45:31 2013
Time drift detected. Please check VKTMtrace file for more details.
Fri Mar 29 06:28:35 2013
Time drift detected. Please check VKTMtrace file for more details.
Fri Mar 29 07:08:14 2013
Thread 1 advanced to log sequence 15 (LGWRswitch)
Current log# 3 seq# 15 mem# 0: /u01/app/oracle/oradata/dave/redo03.log
Fri Mar 29 07:08:16 2013
Archived Log entry 20 added for thread 1sequence 14 ID 0x3312f7c4 dest 1:
Fri Mar 29 07:08:17 2013
LNS: Standby redo logfile selected forthread 1 sequence 15 for destination LOG_ARCHIVE_DEST_2
Fri Mar 29 07:34:48 2013
Time drift detected. Please check VKTMtrace file for more details.
Fri Mar 29 07:48:55 2013
LNS: Attempting destinationLOG_ARCHIVE_DEST_2 network reconnect (3135)
LNS: Destination LOG_ARCHIVE_DEST_2 networkreconnect abandoned
Error 3135 for archive log file 3 to'dave_st'
Errors in file/u01/app/oracle/diag/rdbms/dave_pd/dave/trace/dave_nsa2_3181.trc:
ORA-03135: connection lost contact
LNS: Failed to archive log 3 thread 1sequence 15 (3135)
Fri Mar 29 07:51:45 2013
PING[ARC1]: Heartbeatfailed to connect to standby 'dave_st'. Error is 16143.
因为我们在备库执行的Finish命令,导致心跳中断了。
2.4 查看备库alert log
[oracle@dg2 trace]$ pwd
/u01/app/oracle/diag/rdbms/dave_st/dave/trace
[oracle@dg2 trace]$ tail -20 alert_dave.log
Terminal Recovery: thread 1 seq# 15 redorequired
Terminal Recovery:
Recovery of Online Redo Log: Thread 1 Group5 Seq 15 Reading mem 0
Mem# 0: /u01/app/oracle/oradata/dave/stdbyredo02.log
Identified End-Of-Redo (failover) forthread 1 sequence 15 at SCN 0xffff.ffffffff
Incomplete Recovery applied until change1082890 time 03/29/2013 07:48:53
MRP0: Media Recovery Complete (dave)
Terminal Recovery: successful completion
Fri Mar 29 07:48:49 2013
ARCH: Archival stopped, error occurred.Will continue retrying
ORACLE Instance dave - Archival Error
Forcing ARSCN to IRSCN for TR 0:1082890
Attempt to set limbo arscn 0:1082890 irscn0:1082890
Resetting standby activation ID 856881092(0x3312f7c4)
ORA-16014: log 5 sequence# 15 not archived,no available destinations
ORA-00312: online log 5 thread 1:'/u01/app/oracle/oradata/dave/stdbyredo02.log'
MRP0: Background Media Recovery processshutdown (dave)
Fri Mar 29 07:48:50 2013
Terminal Recovery: completion detected(dave)
Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH
[oracle@dg2 trace]$
2.5 在主库切换归档
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
SEQUENCE# APPLIED
---------- ---------
16 NO
15 NO
14 NO
14 YES
13 YES
13 NO
12 YES
12 NO
11 NO
11 YES
10 NO
10 YES
9 YES
9 NO
8 YES
8 NO
7 YES
7 NO
6 NO
6 YES
5 NO
4 NO
22 rows selected.
SQL>
2.6 再次查看主备库日志
主库日志:
Fri Mar 29 07:52:46 2013
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
Fri Mar 29 07:53:47 2013
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
Fri Mar 29 07:53:49 2013
Thread 1 advanced to log sequence 16 (LGWRswitch)
Current log# 1 seq# 16 mem# 0: /u01/app/oracle/oradata/dave/redo01.log
Fri Mar 29 07:53:49 2013
Archived Log entry 21 added for thread 1sequence 15 ID 0x3312f7c4 dest 1:
Fri Mar 29 07:53:50 2013
FAL[server, ARC2]: Error 16143 creatingremote archivelog file 'dave_st'
FAL[server, ARC2]: FAL archive failed, seetrace file.
ARCH: FAL archive failed. Archivercontinuing
ORACLE Instance dave - Archival Error.Archiver continuing.
Thread 1 advanced to log sequence 17 (LGWRswitch)
Current log# 2 seq# 17 mem# 0: /u01/app/oracle/oradata/dave/redo02.log
Fri Mar 29 07:53:57 2013
Archived Log entry 22 added for thread 1sequence 16 ID 0x3312f7c4 dest 1:
备库日志:
Fri Mar 29 07:48:50 2013
Terminal Recovery: completion detected(dave)
Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH
Fri Mar 29 07:51:34 2013
RFS[5]: Assigned to RFS process 9336
RFS[5]: No connections allowed during/afterterminal recovery.
Fri Mar 29 07:52:35 2013
RFS[6]: Assigned to RFS process 9340
RFS[6]: No connections allowed during/afterterminal recovery.
Fri Mar 29 07:53:36 2013
RFS[7]: Assigned to RFS process 9343
RFS[7]: No connections allowed during/afterterminal recovery.
Fri Mar 29 07:53:39 2013
RFS[8]: Assigned to RFS process 9345
RFS[8]: No connectionsallowed during/after terminal recovery.
2.7 在主库重建standby control file
先在备库查看一下控制文件名称,等会创建完后直接覆盖过去:
SQL> show parameter control
NAME TYPE VALUE
----------------------------------------------- ------------------------------
control_file_record_keep_time integer 7
control_files string /u01/app/oracle/oradata/dave/c
ontrol01.ctl, /u01/app/oracle/
fast_recovery_area/dave/contro
l02.ctl
control_management_pack_access string DIAGNOSTIC+TUNING
主库创建standby controlfile:
SQL> alter database create standbycontrolfile as '/u01/control01.ctl';
Database altered.
copy到备库的目录,在覆盖原来的控制文件:
--先关闭备库:
SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL>
--copy并覆盖:
[oracle@dg2 trace]$ cd/u01/app/oracle/oradata/dave/
[oracle@dg2 dave]$ ls
control01.ctl stdbyredo02.log stdbyredo04.log system01.dbf undotbs01.dbf
stdbyredo01.log stdbyredo03.log sysaux01.dbf temp01.dbf users01.dbf
[oracle@dg2 dave]$ mv control01.ctlcontrol01.ctl.bak
[oracle@dg2 dave]$ ls
control01.ctl.bak stdbyredo02.log stdbyredo04.log system01.dbf undotbs01.dbf
stdbyredo01.log stdbyredo03.log sysaux01.dbf temp01.dbf users01.dbf
[oracle@dg2 dave]$ scp192.168.1.20:/u01/control01.ctl 192.168.1.30:/u01/app/oracle/oradata/dave/
The authenticity of host '192.168.1.20(192.168.1.20)' can't be established.
RSA key fingerprint is0d:6a:5f:78:53:a0:bf:54:a8:e3:7e:67:81:06:8d:75.
Are you sure you want to continueconnecting (yes/no)? yes
Warning: Permanently added '192.168.1.20'(RSA) to the list of known hosts.
oracle@192.168.1.20's password:
oracle@192.168.1.30's password:
control01.ctl 100% 9520KB 865.5KB/s 00:11
Connection to 192.168.1.20 closed.
[oracle@dg2 dave]$ ls
control01.ctl stdbyredo01.log stdbyredo03.log sysaux01.dbf temp01.dbf users01.dbf
control01.ctl.bak stdbyredo02.log stdbyredo04.log system01.dbf undotbs01.dbf
[oracle@dg2 dave]$
[oracle@dg2 archivelog]$ cd/u01/app/oracle/fast_recovery_area/dave/
[oracle@dg2 dave]$ ls
control02.ctl
[oracle@dg2 dave]$ mv control02.ctlcontrol02.ctl.bak
[oracle@dg2 dave]$ ls
control02.ctl.bak
[oracle@dg2 dave]$
[oracle@dg2 dave]$ cp control01.ctl/u01/app/oracle/fast_recovery_area/dave/control02.ctl
2.8 在正常拉起备库
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 814227456 bytes
Fixed Size 2232760 bytes
Variable Size 478154312 bytes
Database Buffers 331350016 bytes
Redo Buffers 2490368 bytes
SQL> alter database mount standby database;
Database altered.
SQL> alter database recover managedstandby database disconnect from session;
Database altered.
SQL>
2.9 查看主备库日志
主库日志:
Fri Mar 29 08:00:51 2013
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
Fri Mar 29 08:01:52 2013
Error 1034 received logging on to thestandby
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
Fri Mar 29 08:02:56 2013
Error 1034 received logging on to thestandby
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
Fri Mar 29 08:03:57 2013
Error 1034 received logging on to thestandby
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
Fri Mar 29 08:04:59 2013
Error 1034 received logging on to the standby
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
Fri Mar 29 08:06:02 2013
Error 1034 received logging on to thestandby
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
Fri Mar 29 08:07:05 2013
Error 1034 received logging on to thestandby
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
Fri Mar 29 08:08:08 2013
PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16058.
Fri Mar 29 08:08:34 2013
ALTER SYSTEM SETlog_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';
Fri Mar 29 08:08:35 2013
Thread 1 advanced to log sequence 18 (LGWRswitch)
Current log# 3 seq# 18 mem# 0: /u01/app/oracle/oradata/dave/redo03.log
Fri Mar 29 08:08:36 2013
******************************************************************
LGWR: Setting 'active'archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
Fri Mar 29 08:08:36 2013
Archived Log entry 23added for thread 1 sequence 17 ID 0x3312f7c4 dest 1:
备库日志:
[oracle@dg2 trace]$ tail -20 alert_dave.log
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file ordirectory
Additional information: 3
Clearing online redo logfile 3 complete
Media Recovery Waiting for thread 1sequence 15
Fetching gap sequence in thread 1, gapsequence 15-16
Fri Mar 29 08:08:48 2013
RFS[3]: Assigned to RFS process 9707
RFS[3]: Opened log for thread 1 sequence 16dbid 856896964 branch 794014730
Fri Mar 29 08:08:49 2013
RFS[4]: Assigned to RFS process 9705
RFS[4]: Opened log for thread 1 sequence 15dbid 856896964 branch 794014730
Archived Log entry 2 added for thread 1sequence 16 rlc 794014730 ID 0x3312f7c4 dest 2:
Archived Log entry 3 added for thread 1sequence 15 rlc 794014730 ID 0x3312f7c4 dest 2:
Fri Mar 29 08:08:55 2013
Media Recovery Log/u01/archivelog/1_15_794014730.dbf
Media Recovery Log/u01/archivelog/1_16_794014730.dbf
Media Recovery Log/u01/archivelog/1_17_794014730.dbf
Fri Mar 29 08:09:11 2013
Media Recovery Waitingfor thread 1 sequence 18 (in transit)
注意这里:
我们把备库拉起来之后,就自动开始同步了。
2.10 切换归档测试
主库:
SQL> alter system switch logfile;
System altered.
SQL> select sequence#,applied from v$archived_log order by sequence# desc;
SEQUENCE# APPLIED
---------- ---------
18 NO
18 NO
17 NO
17 YES
16 YES
16 NO
15 NO
15 YES
14 NO
14 YES
13 YES
13 NO
12 NO
12 YES
11 NO
11 YES
10 YES
10 NO
9 NO
9 YES
8 NO
8 YES
7 NO
7 YES
6 NO
6 YES
5 NO
4 NO
28 rows selected.
备库:
SQL> select sequence#,applied from v$archived_log order by sequence# desc;
SEQUENCE# APPLIED
---------- ---------
18 YES
17 YES
16 YES
15 YES
注意这里,备库已经完全同步了。 之前在我们主库看,18的日志还没有应用,因为我们刚启动备库,应用需要一定的时间。 并且在我们重新配置之后,这里的数字就从15开始了。 是我们中断DG后的数字。 但我们主库还是从4开始计算的。
小结:
对于DG通信异常中断的处理,我们仅仅需要重新创建一份standby 的control file就可以了。
---------------------------------------------------------------------------------------
版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!
Skype: tianlesoftware
QQ: tianlesoftware@gmail.com
Email: tianlesoftware@gmail.com
Blog: http://blog.csdn.net/tianlesoftware
Weibo: http://weibo.com/tianlesoftware
Twitter: http://twitter.com/tianlesoftware
Facebook: http://www.facebook.com/tianlesoftware
Linkedin: http://cn.linkedin.com/in/tianlesoftware