zoukankan      html  css  js  c++  java
  • 换了网线异常了,CRS无法正常启动,clssnmSendingThread: sending status msg to all nodes

    换了网线异常了,CRS无法正常启动,clssnmSendingThread: sending status msg to all nodes
    同事换网线前我将节点2正常关闭了,换完网线告诉我,发现节点2死活起不来了,看上面的日志和一些帖子最后也没解决,尝试过重启、网线拔掉重新插上、查看过存储是否正常和存储重新挂载。。。。看过一个帖子说可能是OCR信息发生了改变,不过之前没备份,也没忘这方面深入考虑。
    最后还是没搞定,主要是技术有限,没准确的定位出具体问题也不敢轻易乱动。。。
    20xx-12-16 19:01:05.792: [ CSSD][3786819328]clssnmSendingThread: sending join msg to all nodes
    20xx-12-16 19:01:05.792: [ CSSD][3786819328]clssnmSendingThread: sent 5 join msgs to all nodes
    20xx-12-16 19:01:06.295: [GIPCHALO][3811858176] gipchaLowerProcessNode: no valid interfaces found to node for 7286464 ms, node 0x7fecd0028450 { host 'myrac1', haName 'CSS_myrac-cluster', srcLuid fac66ea4-f1a960af, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [249 : 249], createTime 7037424, sentRegister 1, localMonitor 1, flags 0x4 }
    20xx-12-16 19:01:06.303: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
    20xx-12-16 19:01:06.420: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618800, LATS 7286584, lastSeqNo 211618797, uniqueness 1576485880, timestamp 1576494065/8540734
    20xx-12-16 19:01:06.435: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618802, LATS 7286594, lastSeqNo 211618799, uniqueness 1576485880, timestamp 1576494066/8541524
    20xx-12-16 19:01:07.304: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
    20xx-12-16 19:01:07.421: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618803, LATS 7287584, lastSeqNo 211618800, uniqueness 1576485880, timestamp 1576494066/8541734
    20xx-12-16 19:01:07.435: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618805, LATS 7287604, lastSeqNo 211618802, uniqueness 1576485880, timestamp 1576494067/8542524
    20xx-12-16 19:01:08.304: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
    20xx-12-16 19:01:08.422: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618806, LATS 7288584, lastSeqNo 211618803, uniqueness 1576485880, timestamp 1576494067/8542734
    20xx-12-16 19:01:08.436: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618808, LATS 7288604, lastSeqNo 211618805, uniqueness 1576485880, timestamp 1576494068/8543524
    20xx-12-16 19:01:09.304: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
    20xx-12-16 19:01:09.422: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618809, LATS 7289584, lastSeqNo 211618806, uniqueness 1576485880, timestamp 1576494068/8543744
    20xx-12-16 19:01:09.437: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618811, LATS 7289604, lastSeqNo 211618808, uniqueness 1576485880, timestamp 1576494069/8544524
    20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmRcfgMgrThread: Local Join
    20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: begin on node(2), waittime 193000
    20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: set curtime (7289964) for my node
    20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: scanning 32 nodes
    20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: Node myrac1, number 1, is in an existing cluster with disk state 3
    20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
    20xx-12-16 19:01:10.305: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
    20xx-12-16 19:01:10.423: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618812, LATS 7290584, lastSeqNo 211618809, uniqueness 1576485880, timestamp 1576494069/8544744
    20xx-12-16 19:01:10.437: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618814, LATS 7290604, lastSeqNo 211618811, uniqueness 1576485880, timestamp 1576494070/8545524
    20xx-12-16 19:01:10.794: [ CSSD][3786819328]clssnmSendingThread: sending join msg to all nodes
    20xx-12-16 19:01:10.794: [ CSSD][3786819328]clssnmSendingThread: sent 5 join msgs to all nodes


    20xx-12-16 20:36:02.919: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), commissioner(-1/0)
    20xx-12-16 20:36:02.919: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(118), status(0), sendresp(1)
    20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmTestSetLastGrockUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(118) msgseq(119), lastupdt<0x7fbb58031e10>, ignoreseq(0)
    20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmGrockOpTagProcess: Request to commission member(1) using key(1) for grock(CLSN.ONSNETPROC.MASTER)
    20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), commissioner(1/1)
    20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(119), status(0), sendresp(1)
    20xx-12-16 20:36:02.921: [ CSSD][2756265728]clssgmTestSetLastGrockUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(119) msgseq(120), lastupdt<0x7fbb5804d490>, ignoreseq(0)
    20xx-12-16 20:36:02.921: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), private data(2052), incarn(40)
    20xx-12-16 20:36:02.921: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(120), status(0), sendresp(1)
    20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmTestSetLastGrockUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(120) msgseq(121), lastupdt<0x7fbb5803dee0>, ignoreseq(0)
    20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmGrockOpTagProcess: Request to commission member(-1) using key(1) for grock(CLSN.ONSNETPROC.MASTER)
    20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), commissioner(-1/0)
    20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(121), status(0), sendresp(1)
    20xx-12-16 20:36:05.064: [ CSSD][2753111808]clssnmSendingThread: sending status msg to all nodes
    20xx-12-16 20:36:05.064: [ CSSD][2753111808]clssnmSendingThread: sent 5 status msgs to all nodes
    20xx-12-16 20:36:09.065: [ CSSD][2753111808]clssnmSendingThread: sending status msg to all nodes
    20xx-12-16 20:36:09.065: [ CSSD][2753111808]clssnmSendingThread: sent 4 status msgs to all nodes
    20xx-12-16 20:36:14.066: [ CSSD][2753111808]clssnmSendingThread: sending status msg to all nodes
    ...

    根据日志能判断出bond信息变了吗?我当时没发现也没分析出来,最后同事说改了bond!当时不是说只换根网线重新排下线吗?我说改回去试试,果然如此,重启一切正常了

    胡乱重启了下,没起来。。。
    [root@myrac2 bin]# ./crsctl query crs activeversion
    Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage
    ORA-15077: could not locate ASM instance serving a required diskgroup

    [root@myrac2 bin]# ./ocrcheck
    PROT-602: Failed to retrieve data from the cluster registry
    PROC-26: Error while accessing the physical storage
    ORA-15077: could not locate ASM instance serving a required diskgroup

    [grid@myrac2 ~]$ cd /u01/app/11.2.0/grid/bin/
    [grid@myrac2 bin]$ srvctl start nodeapps -n myrac2
    PRCR-1070 : Failed to check if resource ora.gsd is registered
    Cannot communicate with crsd
    PRCR-1070 : Failed to check if resource ora.net1.network is registered
    Cannot communicate with crsd
    PRCR-1035 : Failed to look up CRS resource myrac2 for ora.cluster_vip.type
    PRCR-1068 : Failed to query resources
    Cannot communicate with crsd
    PRCR-1070 : Failed to check if resource ora.ons is registered
    Cannot communicate with crsd


    [grid@myrac2 bin]$ srvctl start asm -n myrac2
    PRCR-1070 : Failed to check if resource ora.asm is registered
    Cannot communicate with crsd


    [grid@myrac2 bin]$ srvctl start database -d testdb2
    PRCD-1027 : Failed to retrieve database testdb2
    PRCR-1115 : Failed to find entities of type resource that match filters ((NAME == ora.testdb2.db) && (TYPE == ora.database.type)) and contain attributes VERSION,ORACLE_HOME,DATABASE_TYPE
    Cannot communicate with crsd
    [grid@myrac2 bin]$

    节点2被修改的bond,明显跟1不一样
    [root@myrac2 11.2.0]# service network status
    Configured devices:
    lo bond0 bond1 em1 em2 em3 em4
    Currently active devices:
    lo em1 em2 em3 em4 bond0 bond1
    [root@myrac2 11.2.0]#

    节点1
    [root@myrac1 ~]# service network status
    Configured devices:
    lo bond0 em1 em2 em3 em4 idrac
    Currently active devices:
    lo em1 em2 em3 bond0

    抛开技术行不行先不说,单这件事来说,同事之间的合作有时候更重要。一不小心你就会给别人挖个坑或掉到别人给你挖的坑

  • 相关阅读:
    034.Python的__str__,__repr__,__bool__ ,__add__和__len__魔术方法
    033.Python的__del__析构方法he__call__方法
    032.Python魔术方法__new__和单态模式
    python3使用tabulate漂亮的打印数据
    在Linux真正有效的调节鼠标速度!
    RouterOS上实现内网DNS劫持
    使用grease monkey强力清除搜索结果页的广告
    centos 6.5 apache下配置python cgi 并解决中文乱码
    python的缩进语法不是一种好的设计
    让npm默认使用taobao镜像源
  • 原文地址:https://www.cnblogs.com/ritchy/p/12056084.html
Copyright © 2011-2022 走看看