zoukankan      html  css  js  c++  java
  • rac节点驱逐分析

    本次测试模拟私有网卡down掉,rac节点驱逐分析。

    可以参考导致实例逐出的五大问题 (Doc ID 1526186.1)
    集群资源查看

    [qdtais1]@ht01[/home/oracle]$crsctl status res -t
    --------------------------------------------------------------------------------
    NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
    --------------------------------------------------------------------------------
    Local Resources
    --------------------------------------------------------------------------------
    ora.DATA.dg
                   ONLINE  ONLINE       ht01                                         
                   ONLINE  ONLINE       ht02                                         
    ora.LISTENER.lsnr
                   ONLINE  ONLINE       ht01                                         
                   ONLINE  ONLINE       ht02                                         
    ora.OCR.dg
                   ONLINE  ONLINE       ht01                                         
                   ONLINE  ONLINE       ht02                                         
    ora.asm
                   ONLINE  ONLINE       ht01                     Started             
                   ONLINE  ONLINE       ht02                     Started             
    ora.gsd
                   OFFLINE OFFLINE      ht01                                         
                   OFFLINE OFFLINE      ht02                                         
    ora.net1.network
                   ONLINE  ONLINE       ht01                                         
                   ONLINE  ONLINE       ht02                                         
    ora.ons
                   ONLINE  ONLINE       ht01                                         
                   ONLINE  ONLINE       ht02                                         
    --------------------------------------------------------------------------------
    Cluster Resources
    --------------------------------------------------------------------------------
    ora.LISTENER_SCAN1.lsnr
          1        ONLINE  ONLINE       ht01                                         
    ora.cvu
          1        ONLINE  ONLINE       ht01                                         
    ora.ht01.vip
          1        ONLINE  ONLINE       ht01                                         
    ora.ht02.vip
          1        ONLINE  ONLINE       ht02                                         
    ora.oc4j
          1        ONLINE  ONLINE       ht01                                         
    ora.qdtais.db
          1        ONLINE  ONLINE       ht01                     Open                
          2        ONLINE  ONLINE       ht02                     Open                
    ora.scan1.vip
          1        ONLINE  ONLINE       ht01                                         
    ora.yz.db
          1        OFFLINE OFFLINE                               Instance Shutdown 
    

     查看hosts文件及网卡信息

    [qdtais1]@ht01[/home/oracle]$ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:27:D0:2C:DC  
              inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fed0:2cdc/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:60 errors:0 dropped:0 overruns:0 frame:0
              TX packets:154 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:7317 (7.1 KiB)  TX bytes:20671 (20.1 KiB)
    
    eth1      Link encap:Ethernet  HWaddr 08:00:27:D7:4E:75  
              inet addr:192.168.20.200  Bcast:192.168.20.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fed7:4e75/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:7909 errors:0 dropped:0 overruns:0 frame:0
              TX packets:6555 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:912161 (890.7 KiB)  TX bytes:712119 (695.4 KiB)
    
    eth1:1    Link encap:Ethernet  HWaddr 08:00:27:D7:4E:75  
              inet addr:192.168.20.204  Bcast:192.168.20.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    eth1:3    Link encap:Ethernet  HWaddr 08:00:27:D7:4E:75  
              inet addr:192.168.20.202  Bcast:192.168.20.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    eth2      Link encap:Ethernet  HWaddr 08:00:27:BB:03:40  
              inet addr:192.168.0.10  Bcast:192.168.0.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:febb:340/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:1407822 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1092372 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:1046688365 (998.1 MiB)  TX bytes:606254225 (578.1 MiB)
    
    eth2:1    Link encap:Ethernet  HWaddr 08:00:27:BB:03:40  
              inet addr:169.254.67.75  Bcast:169.254.255.255  Mask:255.255.0.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:265652 errors:0 dropped:0 overruns:0 frame:0
              TX packets:265652 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:143272867 (136.6 MiB)  TX bytes:143272867 (136.6 MiB)
    
    [qdtais1]@ht01[/home/oracle]$cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.20.200 ht01
    192.168.20.201 ht02
    192.168.0.10 ht01-priv1
    192.168.0.20 ht02-priv1
    192.168.20.202 ht01-vip
    192.168.20.203 ht02-vip
    192.168.20.204 ht-scanip
    

    关闭节点1心跳私有网卡eth2

    [root@ht01 ~]# ifconfig  eth2 down
    

    查看网卡信息

    [root@ht01 ~]# ifconfig -a
    eth0      Link encap:Ethernet  HWaddr 08:00:27:D0:2C:DC  
              inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fed0:2cdc/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:884 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1410 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:65160 (63.6 KiB)  TX bytes:869140 (848.7 KiB)
    
    eth1      Link encap:Ethernet  HWaddr 08:00:27:D7:4E:75  
              inet addr:192.168.20.200  Bcast:192.168.20.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fed7:4e75/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:8729 errors:0 dropped:0 overruns:0 frame:0
              TX packets:7292 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:983762 (960.7 KiB)  TX bytes:817872 (798.7 KiB)
    
    eth1:1    Link encap:Ethernet  HWaddr 08:00:27:D7:4E:75  
              inet addr:192.168.20.204  Bcast:192.168.20.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    eth1:2    Link encap:Ethernet  HWaddr 08:00:27:D7:4E:75  
              inet addr:192.168.20.203  Bcast:192.168.20.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    eth1:3    Link encap:Ethernet  HWaddr 08:00:27:D7:4E:75  
              inet addr:192.168.20.202  Bcast:192.168.20.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    eth2      Link encap:Ethernet  HWaddr 08:00:27:BB:03:40  
              BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:1414086 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1097177 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:1051368691 (1002.6 MiB)  TX bytes:608947879 (580.7 MiB)
    
    eth2:1    Link encap:Ethernet  HWaddr 08:00:27:BB:03:40  
              inet addr:169.254.67.75  Bcast:169.254.255.255  Mask:255.255.0.0
              BROADCAST MULTICAST  MTU:1500  Metric:1
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:267864 errors:0 dropped:0 overruns:0 frame:0
              TX packets:267864 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:144385365 (137.6 MiB)  TX bytes:144385365 (137.6 MiB)

    日志分析

    观察节点1 oracle  alert日志

    Thu Mar 26 10:55:29 2020
    SKGXP: ospid 4149: network interface with IP address 169.254.67.75 no longer running (check cable)  ---私有ip地址不运行
    SKGXP: ospid 4149: network interface with IP address 169.254.67.75 is DOWN
    Thu Mar 26 10:55:47 2020
    Reconfiguration started (old inc 4, new inc 6)                    ---开始重新分配资源
    List of instances:
     1 (myinst: 1) 
     Global Resource Directory frozen
     * dead instance detected - domain 0 invalid = TRUE 
     Communication channels reestablished
     Master broadcasted resource hash value bitmaps
     Non-local Process blocks cleaned out
    Thu Mar 26 10:55:47 2020
     LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
     Set master node info 
     Submitted all remote-enqueue requests
     Dwn-cvts replayed, VALBLKs dubious
     All grantable enqueues granted
     Post SMON to start 1st pass IR
    Thu Mar 26 10:55:47 2020
    minact-scn: Inst 1 is now the master inc#:6 mmon proc-id:4198 status:0x7   --Inst 1是主节点
    minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.0014ed0a gcalc-scn:0x0000.0014ed15
    minact-scn: master found reconf/inst-rec before recscn scan old-inc#:6 new-inc#:6
    Thu Mar 26 10:55:47 2020
    Instance recovery: looking for dead threads
     Submitted all GCS remote-cache requests
     Post SMON to start 1st pass IR
     Fix write in gcs resources
    Reconfiguration complete
    Beginning instance recovery of 1 threads            --实例开始recover 节点2上的redo
    Started redo scan
    Completed redo scan
     read 0 KB redo, 0 data blocks need recovery
    Started redo application at
     Thread 2: logseq 13, block 47971, scn 1371433
    Recovery of Online Redo Log: Thread 2 Group 3 Seq 13 Reading mem 0
      Mem# 0: +DATA/qdtais/onlinelog/group_3.268.1023987437
      Mem# 1: +DATA/qdtais/onlinelog/group_3.269.1023987441
    Completed redo application of 0.00MB
    Completed instance recovery at                         -- redo恢复完成
     Thread 2: logseq 13, block 47971, scn 1391434
     0 data blocks read, 0 data blocks written, 0 redo k-bytes read
    Thread 2 advanced to log sequence 14 (thread recovery)
    minact-scn: master continuing after IR
    Thu Mar 26 10:56:47 2020
    Decreasing number of real time LMS from 1 to 0
    Thu Mar 26 11:01:51 2020
    db_recovery_file_dest_size of 4407 MB is 5.08% used. This is a
    user-specified limit on the amount of space that will be used by this
    database for recovery-related files, and does not reflect the amount of
    space available in the underlying filesystem or ASM diskgroup.
    

    观察节点1的grid日志

    2020-03-26 10:55:29.454: 
    [cssd(3278)]CRS-1612:Network communication with node ht02 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.180 seconds   ---和节点2的网络通信超时
    2020-03-26 10:55:36.456: 
    [cssd(3278)]CRS-1611:Network communication with node ht02 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 7.180 seconds
    2020-03-26 10:55:41.458: 
    [cssd(3278)]CRS-1610:Network communication with node ht02 (2) missing for 90% of timeout interval.  Removal of this node from cluster in 2.170 seconds
    2020-03-26 10:55:43.636: 
    [cssd(3278)]CRS-1607:Node ht02 is being evicted in cluster incarnation 480633263; details at (:CSSNM00007:) in /u01/app/grid/log/ht01/cssd/ocssd.log.          ---节点2被集群驱逐
    2020-03-26 10:55:45.815: 
    [cssd(3278)]CRS-1625:Node ht02, number 2, was manually shut down      --节点2集群资源被关闭
    2020-03-26 10:55:45.821: 
    [cssd(3278)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ht01 . --cssd进程重新配置gc资源
    2020-03-26 10:55:45.834: 
    [ctssd(3421)]CRS-2407:The new Cluster Time Synchronization Service reference node is host ht01.
    2020-03-26 10:55:57.079: 
    [crsd(3564)]CRS-5504:Node down event reported for node 'ht02'.
    2020-03-26 10:56:00.027: 
    [crsd(3564)]CRS-2773:Server 'ht02' has been removed from pool 'Generic'.
    2020-03-26 10:56:00.033: 
    [crsd(3564)]CRS-2773:Server 'ht02' has been removed from pool 'ora.qdtais'.
    

    观察节点2grid日志

    2020-03-26 10:55:28.379: 
    [cssd(3208)]CRS-1612:Network communication with node ht01 (1) missing for 50% of timeout interval.  Removal of this node from cluster in 14.800 seconds     ---和节点1的网络通信超时
    2020-03-26 10:55:36.384: 
    [cssd(3208)]CRS-1611:Network communication with node ht01 (1) missing for 75% of timeout interval.  Removal of this node from cluster in 6.790 seconds
    2020-03-26 10:55:40.385: 
    [cssd(3208)]CRS-1610:Network communication with node ht01 (1) missing for 90% of timeout interval.  Removal of this node from cluster in 2.790 seconds
    2020-03-26 10:55:43.180: 
    [cssd(3208)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/grid/log/ht02/
    cssd/ocssd.log.
    2020-03-26 10:55:43.180: 
    [cssd(3208)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/log/ht02/cssd/ocssd.log  --cssd守护进程被强制终止
    2020-03-26 10:55:43.222: 
    [cssd(3208)]CRS-1652:Starting clean up of CRSD resources.   --清理crsd资源
    2020-03-26 10:55:44.259: 
    [cssd(3208)]CRS-1608:This node was evicted by node 1, ht01; details at (:CSSNM00005:) in /u01/app/grid/log/ht02/cssd/ocssd.log.
    

    观察节点2oracle  alert日志

     

    Thu Mar 26 10:55:45 2020
    NOTE: ASMB terminating    --asmb进程终止导致数据库crash
    Errors in file /u01/app/db/diag/rdbms/qdtais/qdtais2/trace/qdtais2_asmb_3974.trc:
    ORA-15064: communication failure with ASM instance
    ORA-03113: end-of-file on communication channel
    Process ID: 
    Session ID: 32 Serial number: 3
    Errors in file /u01/app/db/diag/rdbms/qdtais/qdtais2/trace/qdtais2_asmb_3974.trc:
    ORA-15064: communication failure with ASM instance
    ORA-03113: end-of-file on communication channel
    Process ID: 
    Session ID: 32 Serial number: 3
    ASMB (ospid: 3974): terminating the instance due to error 15064
    Instance terminated by ASMB, pid = 3974
    

      

     

      

     

      

     

  • 相关阅读:
    几个常用myeclipse快捷键
    5G layer
    customize the entry point of pod runtime
    关于JS Pormise的认识
    修改 /etc/pam.d/login, linux 本地账号密码无法登陆,一直返回 登陆的login界面
    Java支付宝PC网站支付功能开发(详细教程)
    支付宝PC支付功能异步通知签名验证失败解决方案
    提交代码出现 Push to origin/master was rejected 错误解决方法
    易语言连接RCON详细教程实例(演示连接Unturned服务器RCON)
    易语言调用外部DLL详细实例教程
  • 原文地址:https://www.cnblogs.com/omsql/p/12577374.html
Copyright © 2011-2022 走看看