zoukankan      html  css  js  c++  java
  • 处理因ASM实例异常导致RAC第一节点实例异常终止故障

     遭遇RAC第一节点实例由于ASM实例异常导致数据库实例非正常停止,记录在此。

    1.故障现象
    两节点RAC第一节点实例停止,经检查ASM实例亦异常终止

    2.故障分析
    检查数据库实例及ASM实例的的alert寻找处理思路。
    1)alert日志内容
    Sun May  8 06:59:06 2011
    Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_asmb_21478.trc:
    ORA-15064: communication failure with ASM instance
    ORA-03113: end-of-file on communication channel
    Sun May  8 06:59:06 2011
    ASMB: terminating instance due to error 15064
    Sun May  8 06:59:06 2011
    Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms1_21275.trc:
    ORA-15064: communication failure with ASM instance
    Sun May  8 06:59:06 2011
    Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lgwr_21283.trc:
    ORA-15064: communication failure with ASM instance
    Sun May  8 06:59:06 2011
    Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lms0_21271.trc:
    ORA-15064: communication failure with ASM instance
    Sun May  8 06:59:06 2011
    Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmon_21267.trc:
    ORA-15064: communication failure with ASM instance
    Sun May  8 06:59:06 2011
    Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_lmd0_21269.trc:
    ORA-15064: communication failure with ASM instance
    Sun May  8 06:59:06 2011
    System state dump is made for local instance
    System State dumped to trace file /oracle/app/oracle/admin/racdb/bdump/racdb1_diag_21263.trc
    Sun May  8 06:59:06 2011
    Errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mman_21279.trc:
    ORA-15064: communication failure with ASM instance
    Sun May  8 06:59:07 2011
    Shutting down instance (abort)
    License high water mark = 7
    Sun May  8 06:59:07 2011
    Trace dumping is performing id=[cdmp_20110508065906]
    Sun May  8 06:59:11 2011
    Instance terminated by ASMB, pid = 21478
    Sun May  8 06:59:12 2011
    Instance terminated by USER, pid = 4110
    Mon May  9 13:44:05 2011

    2)trace文件中截取到如下故障内容
    kjctseventdump-end tail 14 heads 0 @ 0 14 @ -1115894656
     DEFER MSG QUEUE ON LMS1 IS EMPTY
     SEQUENCES:
      0:0.0  1:2933.0
    error 15064 detected in background process
    ORA-15064: communication failure with ASM instance

    3)ASM日志中记录了如下内容
    Thu Feb 10 19:17:58 2011
    NOTE: cache recovered group 1 to fcn 0.20162635
    Thu Feb 10 19:17:58 2011
    NOTE: opening chunk 1 at fcn 0.20162635 ABA
    NOTE: seq=79 blk=1597
    Thu Feb 10 19:17:58 2011
    NOTE: cache mounting group 1/0xBA97DAE1 (ORADATA) succeeded
    SUCCESS: diskgroup ORADATA was mounted
    Thu Feb 10 19:18:01 2011
    NOTE: recovering COD for group 1/0xba97dae1 (ORADATA)
    SUCCESS: completed COD recovery for group 1/0xba97dae1 (ORADATA)
    Thu Feb 10 19:18:01 2011
    Starting background process ASMB
    ASMB started with pid=17, OS id=7767
    Thu Feb 10 19:21:06 2011
    NOTE: ASMB process exiting due to lack of ASM file activity
    Sun May  8 06:48:33 2011
    Shutting down instance (abort)
    License high water mark = 6
    Instance terminated by USER, pid = 20819

    初步判断是由于ASM出现异常导致的此次故障。但是和这里的提示“NOTE: ASMB process exiting due to lack of ASM file activity”没有关系。这个提示仅仅是一个提示而已,在ASM日志中的其他地方也有多次出现。

    3.尝试故障处理
    1)尝试启动ASM无果。

    2)手工启动ASM实例可以成功
    racdb1@racdb1 /home/oracle$ export ORACLE_SID=+ASM1
    +ASM1@racdb1 /home/oracle$ sqlplus / as sysdba

    SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:06 2011

    Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.


    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters and Data Mining options

    NotConnected@> shutdown immediate;
    ASM diskgroups dismounted
    ASM instance shutdown
    NotConnected@> startup;
    ASM instance started

    Total System Global Area  130023424 bytes
    Fixed Size                  2071000 bytes
    Variable Size             102786600 bytes
    ASM Cache                  25165824 bytes

    3)但启动数据库实例时抛出“ORA-01105”和“ORA-38767”错误。
    racdb1@racdb1 /home/oracle$ sqlplus / as sysdba

    SQL*Plus: Release 10.2.0.3.0 - Production on Sun May 8 13:43:53 2011

    Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.

    Connected to an idle instance.

    NotConnected@> startup;
    ORACLE instance started.

    Total System Global Area 8388608000 bytes
    Fixed Size                  2086096 bytes
    Variable Size            1644170032 bytes
    Database Buffers         6727663616 bytes
    Redo Buffers               14688256 bytes
    ORA-01105: mount is incompatible with mounts by other instances
    ORA-38767: flashback retention target parameter mismatch

    4.再次尝试故障处理
    对除VIP之外的CRS资源进行重启,此时仍然无法启动ASM实例和数据库实例。

    5.最后的处理方法
    最后尝试重启第一个节点的所有CRS资源,终于将RAC的第一个节点的所有资源启动完毕。

    6.小结
    通过一系列的故障处理尝试,最终恢复了RAC数据库故障。

    Good luck.

    secooler
    11.05.08

    -- The End --

  • 相关阅读:
    EasyUI改变Layout的Region的宽高,位置等信息
    Linq-C#左连接
    [SQL Server]如何快速查找使用了某个字段的所有存储过程
    NET联调
    Linq-Order By操作
    Java之Filter --Servlet技术中最实用的技术
    JavaWeb之JSON
    JavaSE(一)之认识java
    JavaWeb之Ajax
    【iScroll源码学习02】分解iScroll三个核心事件点
  • 原文地址:https://www.cnblogs.com/einyboy/p/2651960.html
Copyright © 2011-2022 走看看