zoukankan      html  css  js  c++  java
  • 11gR2 RAC启用iptables导致节点宕机问题处理

    通常,在安装数据库时,绝大多数都是要求把selinux及iptables关闭,然后再进行安装的。但是在运营商的系统中,很多安全的因素,需要将现网的数据库主机上的iptables开启的。
    在开启iptables时就要注意了,比如一RAC中的hosts配置如下:
    192.168.142.115       subsdb1         
    192.168.142.117       subsdb1-vip   
    10.0.0.115            subsdb1-priv
    192.168.142.116       subsdb2      
    192.168.142.118       subsdb2-vip   
    10.0.0.116            subsdb2-priv
    192.168.142.32        db-scan

    那么理所当然的要将上面的IP都要放通的。但是在实际操作中,已经放通了上面的IP,结果数据库一的个实例宕掉了。

    看看数据库的alert日志:

    Tue Aug 20 00:29:40 2013
    IPC Send timeout detected. Sender: ospid 8284 [oracle@subsdb2 (LMD0)]
    Receiver: inst 1 binc 1740332689 ospid 15851
    IPC Send timeout to 1.0 inc 10 for msg type 65521 from opid 12
    Tue Aug 20 00:29:48 2013
    IPC Send timeout detected. Sender: ospid 8276 [oracle@subsdb2 (PING)]
    Receiver: inst 2 binc 1801834534 ospid 8276
    Tue Aug 20 00:29:52 2013
    Detected an inconsistent instance membership by instance 2
    Errors in file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_lmon_8282.trc  (incident=784092):
    ORA-29740: evicted by instance number 2, group incarnation 12
    Incident details in: /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/incident/incdir_784092/GDORDB2_lmon_8282_i784092.trc
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_lmon_8282.trc:
    ORA-29740: evicted by instance number 2, group incarnation 12
    LMON (ospid: 8282): terminating the instance due to error 29740
    Tue Aug 20 00:29:54 2013
    ORA-1092 : opitsk aborting process
    Tue Aug 20 00:29:54 2013
    License high water mark = 29
    Tue Aug 20 00:29:57 2013
    System state dump requested by (instance=2, osid=8282 (LMON)), summary=[abnormal instance termination].
    System State dumped to trace file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_diag_8272.trc
    Instance terminated by LMON, pid = 8282
    USER (ospid: 31106): terminating the instance
    Instance terminated by USER, pid = 31106


    单纯从上面来看,初步可以断定是内部通信有问题,但是如何解决?
    但再从数据库的alert和ASM实例的alert日志中都有这样的信息:
    Private Interface 'bond2:1' configured from GPnP for use as a private interconnect.
      [name='bond2:1', type=1, ip=169.254.148.209, mac=00-25-b5-00-00-67, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'bond0' configured from GPnP for use as a public interface.
      [name='bond0', type=1, ip=192.168.142.116, mac=00-25-b5-00-01-cb, net=192.168.142.0/24, mask=255.255.255.0, use=public/1]
    Picked latch-free SCN scheme 3


    从这个信息来看,RAC的内部通信还要用到net=169.254.0.0/16的IP,再从MOS Doc ID 1383737.1也有这样的说明,最后用ifconfig查到了RAC的两个节点中使用到的169网段的IP为:
    169.254.122.59
    169.254.148.209 
    在iptables中放通了这两个IP后,集群正常。


  • 相关阅读:
    Read-Copy Update Implementation For Non-Cache-Coherent Systems
    10 华电内部文档搜索系统 search04
    10 华电内部文档搜索系统 search05
    lucene4
    10 华电内部文档搜索系统 search01
    01 lucene基础 北风网项目培训 Lucene实践课程 索引
    01 lucene基础 北风网项目培训 Lucene实践课程 系统架构
    01 lucene基础 北风网项目培训 Lucene实践课程 Lucene概述
    第五章 大数据平台与技术 第13讲 NoSQL数据库
    第五章 大数据平台与技术 第12讲 大数据处理平台Spark
  • 原文地址:https://www.cnblogs.com/pangblog/p/3271113.html
Copyright © 2011-2022 走看看