zoukankan      html  css  js  c++  java
  • 【案例】Oracle ORA-29740

    Oracle RAC信息

    数据库版本 Oracle11.2.0.4
    节点数 2节点RAC
    操作系统 Red Hat Enterprise Linux Server release 6.9 (Santiago)

    故障现象:

    节点2 实例宕机,vip飘到节点1

    下列为故障时间段的alert日志和cssd日志信息

    alert日志信息:

    IPC Send timeout: Terminating pid 34 osid 52694
    Thu Jul 02 12:07:39 2020
    Communications reconfiguration: instance_number 1
    Detected an inconsistent instance membership by instance 1
    Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lmon_52640.trc  (incident=304089):
    ORA-29740: evicted by instance number 1, group incarnation 6
    Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_304089/orcl2_lmon_52640_i304089.trc
    Thu Jul 02 12:07:41 2020
    IPC Send timeout detected. Sender: ospid 52682 [oracle@ze02 (LGWR)]
    Receiver: inst 1 binc 460990968 ospid 36977
    IPC Send timeout to 1.4 inc 4 for msg type 73 from opid 28
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lmon_52640.trc:
    ORA-29740: evicted by instance number 1, group incarnation 6
    LMON (ospid: 52640): terminating the instance due to error 29740
    Thu Jul 02 12:07:42 2020
    ORA-1092 : opitsk aborting process
    Thu Jul 02 12:07:46 2020
    System state dump requested by (instance=2, osid=52640 (LMON)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_52630_20200702120746.trc
    alert_orcl.log

    cssd.log 日志报错信息

    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x1383
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDeadProc: proc 0x7f9198099540
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDestroyProc: cleaning up proc(0x7f9198099540) con(0x1354) skgpid  ospid 10669 with 0 clients, refcount 0
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x1354
    2020-07-07 17:54:58.618: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:54:59.122: [    CSSD][2691610368]clssnmSendingThread: sending join msg to all nodes
    2020-07-07 17:54:59.122: [    CSSD][2691610368]clssnmSendingThread: sent 5 join msgs to all nodes
    2020-07-07 17:54:59.349: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414293, LATS 4294746660, lastSeqNo
     6414292, uniqueness 1588026083, timestamp 1594115698/17926648482020-07-07 17:54:59.618: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:00.355: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414294, LATS 4294747670, lastSeqNo
     6414293, uniqueness 1588026083, timestamp 1594115699/17926658482020-07-07 17:55:00.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:01.355: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414295, LATS 4294748670, lastSeqNo
     6414294, uniqueness 1588026083, timestamp 1594115700/17926668482020-07-07 17:55:01.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:02.383: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414296, LATS 4294749700, lastSeqNo
     6414295, uniqueness 1588026083, timestamp 1594115701/17926678482020-07-07 17:55:02.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDeadProc: proc 0x7f9198039460
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDestroyProc: cleaning up proc(0x7f9198039460) con(0x12f3) skgpid  ospid 10932 with 0 clients, refcount 0
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x12f3
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssscSelect: cookie accept request 0xc85280
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmAllocProc: (0x7f919807dbb0) allocated
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmClientConnectMsg: properties of cmProc 0x7f919807dbb0 - 1,2,3,4,5
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmClientConnectMsg: Connect from con(0x13e3) proc(0x7f919807dbb0) pid(10932) version 11:2:1:4, properties: 1,2,3,4,5
    cssd日志

    判断:

    RAC机器发生脑裂,节点服务器防火墙没有accept 私网地址和HAIP地址。

    修改了防火墙规则,或者关闭防火墙,CRS自动拉起节点2实例

    防火墙规则:

    # iptables -L
    Chain INPUT (policy ACCEPT)
    target      prot    opt     source              destination
    ACCEPT    all    --    anywhere      anywhere      state RELATED,ESTABLISHED
    ACCEPT   icmp    --    anywhere      anywhere     icmp echo-request
    ACCEPT   all           --    anywhere      anywhere
    ACCEPT   tcp   --    100.82.16.8       anywhere      state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT   tcp    --    100.82.16.9          anywhere     state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT    tcp   --    100.82.16.10    anywhere   state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT   tcp    --    100.82.10.11    anywhere    state NEW tcp dpt:ssh #RAC公网地址
    ACCEPT    tcp   --    100.82.10.12    anywhere    state NEW tcp dpt:ssh #RAC公网地址
    ACCEPT   tcp    --    anywhere      anywhere     state NEW tcp dpt:ncube-lm
    ACCEPT    all   --    anywhere       anywhere     source IP range 100.82.16.152-100.82.16.153
    ACCEPT   all    --    ze01-priv       anywhere     #RAC私网地址
    ACCEPT    all   --    ze02-priv        anywhere      #RAC私网地址
    ACCEPT    all    --    anywhere      anywhere      source IP range 100.82.11.11-100.82.11.16
    ACCEPT   all    --    169.254.85.175    anywhere    #HAIP地址
    ACCEPT    all   --    169.254.180.52    anywhere   #HAIP地址
    ACCEPT    tcp   --    anywhere      anywhere       source IP range 100.82.16.26-100.82.16.27 state NEW tcp multiport dports zabbix-agent,zabbix-trapper
    REJECT   all   --    anywhere      anywhere      reject-with icmp-host-prohibited

  • 相关阅读:
    Python while循环实现重试
    VBA find查找行号和列号的方法
    通过selenium控制浏览器滚动条
    【转】自然语言处理P,R,F值的计算公式
    【转】ultraedit 正则表达式
    【转】java文件输出流,写到.txt文件,如何实现换行
    Java heap space 解决方法
    XML+RDF——实现Web数据基于语义的描述(转载)
    java学习笔记——jsp简单方法读取txt文本数据
    一个完全独立的今天
  • 原文地址:https://www.cnblogs.com/elontian/p/13324109.html
Copyright © 2011-2022 走看看