zoukankan      html  css  js  c++  java
  • 【案例】Oracle ORA-29740

    Oracle RAC信息

    数据库版本 Oracle11.2.0.4
    节点数 2节点RAC
    操作系统 Red Hat Enterprise Linux Server release 6.9 (Santiago)

    故障现象:

    节点2 实例宕机,vip飘到节点1

    下列为故障时间段的alert日志和cssd日志信息

    alert日志信息:

    IPC Send timeout: Terminating pid 34 osid 52694
    Thu Jul 02 12:07:39 2020
    Communications reconfiguration: instance_number 1
    Detected an inconsistent instance membership by instance 1
    Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lmon_52640.trc  (incident=304089):
    ORA-29740: evicted by instance number 1, group incarnation 6
    Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_304089/orcl2_lmon_52640_i304089.trc
    Thu Jul 02 12:07:41 2020
    IPC Send timeout detected. Sender: ospid 52682 [oracle@ze02 (LGWR)]
    Receiver: inst 1 binc 460990968 ospid 36977
    IPC Send timeout to 1.4 inc 4 for msg type 73 from opid 28
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lmon_52640.trc:
    ORA-29740: evicted by instance number 1, group incarnation 6
    LMON (ospid: 52640): terminating the instance due to error 29740
    Thu Jul 02 12:07:42 2020
    ORA-1092 : opitsk aborting process
    Thu Jul 02 12:07:46 2020
    System state dump requested by (instance=2, osid=52640 (LMON)), summary=[abnormal instance termination].
    System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_52630_20200702120746.trc
    alert_orcl.log

    cssd.log 日志报错信息

    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x1383
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDeadProc: proc 0x7f9198099540
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDestroyProc: cleaning up proc(0x7f9198099540) con(0x1354) skgpid  ospid 10669 with 0 clients, refcount 0
    2020-07-07 17:54:58.329: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x1354
    2020-07-07 17:54:58.618: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:54:59.122: [    CSSD][2691610368]clssnmSendingThread: sending join msg to all nodes
    2020-07-07 17:54:59.122: [    CSSD][2691610368]clssnmSendingThread: sent 5 join msgs to all nodes
    2020-07-07 17:54:59.349: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414293, LATS 4294746660, lastSeqNo
     6414292, uniqueness 1588026083, timestamp 1594115698/17926648482020-07-07 17:54:59.618: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:00.355: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414294, LATS 4294747670, lastSeqNo
     6414293, uniqueness 1588026083, timestamp 1594115699/17926658482020-07-07 17:55:00.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:01.355: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414295, LATS 4294748670, lastSeqNo
     6414294, uniqueness 1588026083, timestamp 1594115700/17926668482020-07-07 17:55:01.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:02.383: [    CSSD][2699503360]clssnmvDHBValidateNcopy: node 1, ze02, has a disk HB, but no network HB, DHB has rcfg 483520902, wrtcnt, 6414296, LATS 4294749700, lastSeqNo
     6414295, uniqueness 1588026083, timestamp 1594115701/17926678482020-07-07 17:55:02.619: [    CSSD][2694764288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDeadProc: proc 0x7f9198039460
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDestroyProc: cleaning up proc(0x7f9198039460) con(0x12f3) skgpid  ospid 10932 with 0 clients, refcount 0
    2020-07-07 17:55:03.108: [    CSSD][2708354816]clssgmDiscEndpcl: gipcDestroy 0x12f3
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssscSelect: cookie accept request 0xc85280
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmAllocProc: (0x7f919807dbb0) allocated
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmClientConnectMsg: properties of cmProc 0x7f919807dbb0 - 1,2,3,4,5
    2020-07-07 17:55:03.110: [    CSSD][2708354816]clssgmClientConnectMsg: Connect from con(0x13e3) proc(0x7f919807dbb0) pid(10932) version 11:2:1:4, properties: 1,2,3,4,5
    cssd日志

    判断:

    RAC机器发生脑裂,节点服务器防火墙没有accept 私网地址和HAIP地址。

    修改了防火墙规则,或者关闭防火墙,CRS自动拉起节点2实例

    防火墙规则:

    # iptables -L
    Chain INPUT (policy ACCEPT)
    target      prot    opt     source              destination
    ACCEPT    all    --    anywhere      anywhere      state RELATED,ESTABLISHED
    ACCEPT   icmp    --    anywhere      anywhere     icmp echo-request
    ACCEPT   all           --    anywhere      anywhere
    ACCEPT   tcp   --    100.82.16.8       anywhere      state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT   tcp    --    100.82.16.9          anywhere     state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT    tcp   --    100.82.16.10    anywhere   state NEW tcp dpt:ssh #Oracle备份服务器地址
    ACCEPT   tcp    --    100.82.10.11    anywhere    state NEW tcp dpt:ssh #RAC公网地址
    ACCEPT    tcp   --    100.82.10.12    anywhere    state NEW tcp dpt:ssh #RAC公网地址
    ACCEPT   tcp    --    anywhere      anywhere     state NEW tcp dpt:ncube-lm
    ACCEPT    all   --    anywhere       anywhere     source IP range 100.82.16.152-100.82.16.153
    ACCEPT   all    --    ze01-priv       anywhere     #RAC私网地址
    ACCEPT    all   --    ze02-priv        anywhere      #RAC私网地址
    ACCEPT    all    --    anywhere      anywhere      source IP range 100.82.11.11-100.82.11.16
    ACCEPT   all    --    169.254.85.175    anywhere    #HAIP地址
    ACCEPT    all   --    169.254.180.52    anywhere   #HAIP地址
    ACCEPT    tcp   --    anywhere      anywhere       source IP range 100.82.16.26-100.82.16.27 state NEW tcp multiport dports zabbix-agent,zabbix-trapper
    REJECT   all   --    anywhere      anywhere      reject-with icmp-host-prohibited

  • 相关阅读:
    Happy Number
    [leedcode]Remove Linked List Elements
    [leedcode] Count Primes
    编写一个截取字符串的函数,输入为一个字符串和字节数,输出为按字节截取的字符串。 但是要保证汉字不被截半个,如“我ABC”4,应该截为“我AB”,输入“我ABC汉DEF”,6,应该输出为“我ABC”而不是“我ABC+汉的半个”。
    最短路(队列优化)
    两函数的交点
    最小生成树
    最小生成树
    线段树区间修改和查询和单点查询(线段树模板1)
    博弈论合集(博弈)
  • 原文地址:https://www.cnblogs.com/elontian/p/13324109.html
Copyright © 2011-2022 走看看