zoukankan      html  css  js  c++  java
  • 从巡检备份失败排查解决数据库故障

    最近某业务备份报错:
    Starting Control File and SPFILE Autobackup at 09-MAY-16
    piece handle=c-335040995-20160509-00 comment=API Version 2.0,MMS Version 5.0.0.0
    Finished Control File and SPFILE Autobackup at 09-MAY-16

    sql statement: alter system archive log current

    released channel: ch00

    released channel: ch01

    allocated channel: ch00
    channel ch00: sid=491 instance=dlsc1 devtype=SBT_TAPE
    channel ch00: Veritas NetBackup for Oracle - Release 7.0 (2010010501)

    released channel: ch00
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-12001: could not open channel ch01
    RMAN-10008: could not create channel context
    RMAN-10003: unable to connect to target database
    ORA-12170: TNS:Connect timeout occurred
    RMAN> RMAN>
    Recovery Manager complete.
    Script /oracle/nbu_scripts/hot_database_backup.sh
    ==== ended in error on Mon May 9 09:26:39 BEIST 2016 ====
    从备份信息上看数据库在备份完Control File and SPFILE切换归档日志后备份归档出现问题。
    RMAN-10003: unable to connect to target database提示不能连接到目标库。
    检查节点1,发现节点1正常。节点2无法进入sqlplus和rman下。
    oracle@xxxxdb2:/oracle$ sqlplus / as sysdba

    SQL*Plus: Release 10.2.0.3.0 - Production on Mon May 9 14:46:01 2016

    Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

    Connected to an idle instance.

    SQL> startup
    ORA-01078: failure in processing system parameters
    ORA-01565: error in identifying file '/dev/rspfile'
    ORA-27041: unable to open file
    IBM AIX RISC System/6000 Error: 6: No such device or address
    Additional information: 11
    SQL>
    SQL> exit
    Disconnected
    oracle@xxxxdb2:/oracle$ rman target /

    Recovery Manager: Release 10.2.0.3.0 - Production on Mon May 9 14:46:22 2016

    Copyright (c) 1982, 2005, Oracle. All rights reserved.

    connected to target database (not started)

    RMAN> exit


    Recovery Manager complete.
    oracle@xxxxdb2:/oracle$
    查看crs状态:
    Recovery Manager complete.
    oracle@xxxxdb2:/oracle$ crs_stat -t
    CRS-0184: Cannot communicate with the CRS daemon.

    oracle@xxxxdb2:/oracle$
    哎,难道是crs没起来?
    尝试启动crs
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crsctl start crs
    Attempting to start CRS stack
    The CRS stack will be started shortly
    等了一会以为crs能顺利启动结果还是起不来如下:
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crsctl check crs
    Failure 1 contacting CSS daemon
    Cannot communicate with CRS
    Cannot communicate with EVM
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# crs_stat -t
    CRS-0184: Cannot communicate with the CRS daemon.
    什么情况,执行ocr命令看看ocr的情况结果:
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./ocrcheck
    PROT-602: Failed to retrieve data from the cluster registry
    到这里已经知道啥问题了,存储肯定没认到。
    查看数据所在的vg
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# lsvg
    rootvg
    datavg
    archvg
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# lsvg datavg
    0516-010 : Volume group must be varied on; use varyonvg command.果然datavg没有激活。到这里别急着激活datavg
    因为datavg没有激活很可能是hacmp没有起来,查看hacmp状态,确实没有起来。
    root@xxxxdb2:/# /usr/es/sbin/cluster/utilities/clshowsrv -v
    Status of the RSCT subsystems used by HACMP:
    Subsystem Group PID Status
    topsvcs topsvcs inoperative
    grpsvcs grpsvcs inoperative
    grpglsm grpsvcs inoperative
    emsvcs emsvcs inoperative
    emaixos emsvcs inoperative
    ctrmc rsct 262394 active
    Status of the HACMP subsystems:
    Subsystem Group PID Status
    clcomdES clcomdES 200944 active
    clstrmgrES cluster 311646 active
    Status of the optional HACMP subsystems:
    Subsystem Group PID Status
    clinfoES cluster inoperative


    Obtaining information via SNMP from Node: xxxxdb1...

    _____________________________________________________________________________
    Cluster Name: xxxxdb
    Cluster State: UP
    Cluster Substate: STABLE
    _____________________________________________________________________________


    Node Name: xxxxdb1 State: UP

    Network Name: net_ether_02 State: UP

    Address: 192.168.77.194 Label: xxxxdb1_priv State: UP


    Node Name: xxxxdb2 State: DOWN

    Network Name: net_ether_02 State:
    Address: xxxx Label: xxxxdb2 State: DOWN
    Address: 192.168.77.195 Label: xxxxdb2_priv State: DOWN

    Cluster Name: xxxxdb

    Resource Group Name: orarg
    Startup Policy: Online On All Available Nodes
    Fallover Policy: Bring Offline (On Error Node Only)
    Fallback Policy: Never Fallback
    Site Policy: ignore
    Priority Override Information:
    Primary Instance POL:
    [MORE...5]
    问题找到了,启动hacmp吧
    smitty clstart
    Start Cluster Services on these nodes
    启动完毕:
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# lsvg datavg
    VOLUME GROUP: datavg VG IDENTIFIER: 00c63cf200004c000000011d0937bc9f
    VG STATE: active PP SIZE: 64 megabyte(s)
    VG PERMISSION: read/write TOTAL PPs: 7990 (511360 megabytes)
    MAX LVs: 256 FREE PPs: 661 (42304 megabytes)
    LVs: 44 USED PPs: 7329 (469056 megabytes)
    OPEN LVs: 5 QUORUM: 3
    TOTAL PVs: 5 VG DESCRIPTORS: 5
    STALE PVs: 0 STALE PPs: 0
    ACTIVE PVs: 5 AUTO ON: no
    Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
    VG Mode: Concurrent
    Node ID: 2 Active Nodes: 1
    MAX PPs per VG: 32768 MAX PVs: 1024
    LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
    HOT SPARE: no BB POLICY: relocatable
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./ocrcheck
    Status of Oracle Cluster Registry is as follows :
    Version : 2
    Total space (kbytes) : 130852
    Used space (kbytes) : 3300
    Available space (kbytes) : 127552
    ID : 222055846
    Device/File Name : /dev/rocr
    Device/File integrity check succeeded

    Device/File not configured

    Cluster registry integrity check succeeded

    看来datavg已挂载
    接下来启动crs
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crsctl start crs
    Attempting to start CRS stack
    The CRS stack will be started shortly
    root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crs_stat -t
    Name Type Target State Host
    ------------------------------------------------------------
    ora.xxxx.db application ONLINE ONLINE xxxxdb1
    ora....c1.inst application ONLINE ONLINE xxxxdb1
    ora....c2.inst application ONLINE ONLINE xxxxdb2
    ora....B1.lsnr application ONLINE ONLINE xxxxdb1
    ora....db1.gsd application ONLINE ONLINE xxxxdb1
    ora....db1.ons application ONLINE ONLINE xxxxdb1
    ora....db1.vip application ONLINE ONLINE xxxxdb1
    ora....B2.lsnr application ONLINE ONLINE xxxxdb2
    ora....db2.gsd application ONLINE ONLINE xxxxdb2
    ora....db2.ons application ONLINE ONLINE xxxxdb2
    ora....db2.vip application ONLINE ONLINE xxxxdb2

    ok到这里问题解决。分析下故障思路
    先是从备份信息得到NBU无法连接到节点2目标库-->sqlplus和rman均失败-->crs启动失败-->ocrcheck失败-->datavg没有激活-->hacmp没有启动

  • 相关阅读:
    如何避免延迟发货和虚假发货
    买家一个单下了两件,我想发两个包裹怎么办
    我有多个店铺,怎么方便的打单?
    一个人在我店里下了几个单,可以合起来发一个包裹吗
    疫情风险高的地方快递都不发,我怎么把这些地区的订单排除掉?
    拼多多商家发货后,可以实时提醒哪些物流快超时了吗
    史上最通俗易懂的手写人工神经网络——(一)(转载自https://blog.csdn.net/xipengbozai/article/details/118115444)
    炸天的3D引擎OpenCASCADE的用法及案例(https://blog.csdn.net/xipengbozai/article/details/117044032?spm=1001.2014.3001.5502)
    屌炸天的3D引擎OpenCASCADE的用法及案例(转载之处:)
    JS实现环绕地球飞行的3D飞行线动画效果(JS+HTML)
  • 原文地址:https://www.cnblogs.com/datalife/p/5474159.html
Copyright © 2011-2022 走看看