zoukankan      html  css  js  c++  java
  • 【案例】Oracle ORA-29740

    Oracle RAC信息

    数据库版本 Oracle11.2.0.4
    节点数 2节点RAC
    操作系统 Red Hat Enterprise Linux Server release 6.9 (Santiago)

    故障现象:

    节点2 实例宕机,vip飘到节点1

    下列为故障时间段的alert日志和cssd日志信息

    alert日志信息:

    IPC Send timeout: Terminating pid 34 osid 52694
    Thu Jul 02 12:07:39 2020
    Communications reconfiguration: instance_number 1
    Detected an inconsistent instance membership by instance 1
    Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lmon_52640.trc  (incident=304089):
    ORA-29740: evicted by instance number 1, group incarnation 6
    Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_304089/orcl2_lmon_52640_i304089.trc
    Thu Jul 02 12:07:41 2020
    IPC Send timeout detected. Sender: ospid 52682 [oracle@ze02 (LGWR)]
    Receiver: inst 1 binc 460990968 ospid 36977
    IPC Send timeout to 1.4 inc 4 for msg type 73 from opid 28
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lmon_52640.trc:
    ORA-29740: evicted by instance number 1, group incarnation 6
    LMON (ospid: 52640): terminating the instance due to error 29740
    Thu Jul 02 12:07:42 2020
    ORA-1092 : opitsk aborting process
    Thu Jul 02 12:07:46 2020
    System state dump requested by (instance=2, osid=52640 (LMON)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_52630_20200702120746.trc
    alert_orcl.log

    cssd.log 日志报错信息

    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x1383
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDeadProc: proc 0x7f9198099540
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDestroyProc: cleaning up proc(0x7f9198099540) con(0x1354) skgpid  ospid 10669 with 0 clients, refcount 0
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x1354
    2020-07-07 17:54:58.618: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:54:59.122: [    CSSD][2691610368]clssnmSendingThread: sending join msg to all nodes
    2020-07-07 17:54:59.122: [    CSSD][2691610368]clssnmSendingThread: sent 5 join msgs to all nodes
    2020-07-07 17:54:59.349: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414293, LATS 4294746660, lastSeqNo
     6414292, uniqueness 1588026083, timestamp 1594115698/17926648482020-07-07 17:54:59.618: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:00.355: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414294, LATS 4294747670, lastSeqNo
     6414293, uniqueness 1588026083, timestamp 1594115699/17926658482020-07-07 17:55:00.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:01.355: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414295, LATS 4294748670, lastSeqNo
     6414294, uniqueness 1588026083, timestamp 1594115700/17926668482020-07-07 17:55:01.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:02.383: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414296, LATS 4294749700, lastSeqNo
     6414295, uniqueness 1588026083, timestamp 1594115701/17926678482020-07-07 17:55:02.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDeadProc: proc 0x7f9198039460
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDestroyProc: cleaning up proc(0x7f9198039460) con(0x12f3) skgpid  ospid 10932 with 0 clients, refcount 0
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x12f3
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssscSelect: cookie accept request 0xc85280
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmAllocProc: (0x7f919807dbb0) allocated
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmClientConnectMsg: properties of cmProc 0x7f919807dbb0 - 1,2,3,4,5
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmClientConnectMsg: Connect from con(0x13e3) proc(0x7f919807dbb0) pid(10932) version 11:2:1:4, properties: 1,2,3,4,5
    cssd日志

    判断:

    RAC机器发生脑裂,节点服务器防火墙没有accept 私网地址和HAIP地址。

    修改了防火墙规则,或者关闭防火墙,CRS自动拉起节点2实例

    防火墙规则:

    # iptables -L
    Chain INPUT (policy ACCEPT)
    target      prot    opt     source              destination
    ACCEPT    all    --    anywhere      anywhere      state RELATED,ESTABLISHED
    ACCEPT   icmp    --    anywhere      anywhere     icmp echo-request
    ACCEPT   all           --    anywhere      anywhere
    ACCEPT   tcp   --    100.82.16.8       anywhere      state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT   tcp    --    100.82.16.9          anywhere     state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT    tcp   --    100.82.16.10    anywhere   state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT   tcp    --    100.82.10.11    anywhere    state NEW tcp dpt:ssh #RAC公网地址
    ACCEPT    tcp   --    100.82.10.12    anywhere    state NEW tcp dpt:ssh #RAC公网地址
    ACCEPT   tcp    --    anywhere      anywhere     state NEW tcp dpt:ncube-lm
    ACCEPT    all   --    anywhere       anywhere     source IP range 100.82.16.152-100.82.16.153
    ACCEPT   all    --    ze01-priv       anywhere     #RAC私网地址
    ACCEPT    all   --    ze02-priv        anywhere      #RAC私网地址
    ACCEPT    all    --    anywhere      anywhere      source IP range 100.82.11.11-100.82.11.16
    ACCEPT   all    --    169.254.85.175    anywhere    #HAIP地址
    ACCEPT    all   --    169.254.180.52    anywhere   #HAIP地址
    ACCEPT    tcp   --    anywhere      anywhere       source IP range 100.82.16.26-100.82.16.27 state NEW tcp multiport dports zabbix-agent,zabbix-trapper
    REJECT   all   --    anywhere      anywhere      reject-with icmp-host-prohibited

  • 相关阅读:
    AtomicIntegerFieldUpdater 源码分析
    AtomicIntegerArray 源码分析
    AtomicInteger 源码分析
    ArrayBlockingQueue 源码分析
    ReentrantReadWriteLock 源码分析
    ReentrantLock 源码分析
    <Chapter 2>2-2-2.开发Java应用(Developing a Java App)
    <Chapter 2>2-2-2.开发Python应用(Developing a Python App)
    <Chapter 2>2-2-1.用户偏好模式(The User Preferences Pattern)
    <Chapter 2>2-2.开发应用(developing the Application)
  • 原文地址:https://www.cnblogs.com/elontian/p/13324109.html
Copyright © 2011-2022 走看看