zoukankan      html  css  js  c++  java
  • 11gR2 RAC启用iptables导致节点宕机问题处理

    通常,在安装数据库时,绝大多数都是要求把selinux及iptables关闭,然后再进行安装的。但是在运营商的系统中,很多安全的因素,需要将现网的数据库主机上的iptables开启的。
    在开启iptables时就要注意了,比如一RAC中的hosts配置如下:
    192.168.142.115       subsdb1         
    192.168.142.117       subsdb1-vip   
    10.0.0.115            subsdb1-priv
    192.168.142.116       subsdb2      
    192.168.142.118       subsdb2-vip   
    10.0.0.116            subsdb2-priv
    192.168.142.32        db-scan

    那么理所当然的要将上面的IP都要放通的。但是在实际操作中,已经放通了上面的IP,结果数据库一的个实例宕掉了。

    看看数据库的alert日志:

    Tue Aug 20 00:29:40 2013
    IPC Send timeout detected. Sender: ospid 8284 [oracle@subsdb2 (LMD0)]
    Receiver: inst 1 binc 1740332689 ospid 15851
    IPC Send timeout to 1.0 inc 10 for msg type 65521 from opid 12
    Tue Aug 20 00:29:48 2013
    IPC Send timeout detected. Sender: ospid 8276 [oracle@subsdb2 (PING)]
    Receiver: inst 2 binc 1801834534 ospid 8276
    Tue Aug 20 00:29:52 2013
    Detected an inconsistent instance membership by instance 2
    Errors in file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_lmon_8282.trc  (incident=784092):
    ORA-29740: evicted by instance number 2, group incarnation 12
    Incident details in: /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/incident/incdir_784092/GDORDB2_lmon_8282_i784092.trc
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Errors in file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_lmon_8282.trc:
    ORA-29740: evicted by instance number 2, group incarnation 12
    LMON (ospid: 8282): terminating the instance due to error 29740
    Tue Aug 20 00:29:54 2013
    ORA-1092 : opitsk aborting process
    Tue Aug 20 00:29:54 2013
    License high water mark = 29
    Tue Aug 20 00:29:57 2013
    System state dump requested by (instance=2, osid=8282 (LMON)), summary=[abnormal instance termination].
    System State dumped to trace file /oracle/app/oracle/diag/rdbms/gdordb/GDORDB2/trace/GDORDB2_diag_8272.trc
    Instance terminated by LMON, pid = 8282
    USER (ospid: 31106): terminating the instance
    Instance terminated by USER, pid = 31106


    单纯从上面来看,初步可以断定是内部通信有问题,但是如何解决?
    但再从数据库的alert和ASM实例的alert日志中都有这样的信息:
    Private Interface 'bond2:1' configured from GPnP for use as a private interconnect.
      [name='bond2:1', type=1, ip=169.254.148.209, mac=00-25-b5-00-00-67, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
    Public Interface 'bond0' configured from GPnP for use as a public interface.
      [name='bond0', type=1, ip=192.168.142.116, mac=00-25-b5-00-01-cb, net=192.168.142.0/24, mask=255.255.255.0, use=public/1]
    Picked latch-free SCN scheme 3


    从这个信息来看,RAC的内部通信还要用到net=169.254.0.0/16的IP,再从MOS Doc ID 1383737.1也有这样的说明,最后用ifconfig查到了RAC的两个节点中使用到的169网段的IP为:
    169.254.122.59
    169.254.148.209 
    在iptables中放通了这两个IP后,集群正常。


  • 相关阅读:
    B树、B树、B+树、B*树
    CSS黑客技术的实现
    ORM映射框架总结SQL 语句生成组件
    突然发现 ViewState,Linq 水火不容
    ALinq 入门学习(一)ALinq简介
    Google 地图基本接口(一)
    ORM映射框架总结映射桥梁
    ALinq 入门学习(二)DataContext
    ORM映射框架总结数据库操作库(精修版)
    C# 使用线程你可能不知道的问题
  • 原文地址:https://www.cnblogs.com/pangblog/p/3271113.html
Copyright © 2011-2022 走看看