原因:
在测试机上首次安装oracle11G RAC,安装完成后正常使用,过了一段时间后重启节点1测试是否可以自启动,解决节点1没有自启动,手工启动也无法启动
过程:
在节点一上运行:
# pwd
/u01/grid/bin
# ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
查看节点1日志
# pwd
/u01/grid/log/nodea/client
# cat crsctl_grid.log
Oracle Database 11g Clusterware Release 11.2.0.3.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
[ CLWAL][1]clsw_Initialize: OCR initlevel [3]
2014-04-09 22:45:20.882: [ CRSCTL][1]File /u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not modified, OCR key was empty
[ CLWAL][1]clsw_Initialize: OLR initlevel [30000]
2014-04-17 07:27:27.517: [ CRSCTL][1]File /u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not modified, OCR key was empty
2014-04-19 02:24:13.609: [ CRSCTL][1]File /u01/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was not modified, OCR key was empty
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: failuring during clsaauthmsg ret clsaretOSD (8), endp 1110bdd70 [0000000000000018] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=32b4238c-0bc8efcf-12779694))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodea_)(GIPCID=0bc8efcf-32b4238c-7078108))', numPend 5, numReady 0, numDone 2, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 7078108, flags 0x2ca712, usrFlags 0x34000 }
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos op : write
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos dep : No space left on device (28)
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos loc : authrespset5
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos info: len -1 != expected 4
2014-04-30 02:19:51.493: [ CSSCLNT][1]clssscConnect: gipc request failed with 22 (12)
2014-04-30 02:19:51.493: [ CSSCLNT][1]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_nodea_)) failed, rc 22
发现关键问题:
2014-04-30 02:19:51.492: [GIPCXCPT][1] gipcmodClsaAuthStart: slos dep : No space left on device (28)
查看节点1磁盘空间,发现确实没有空间了
# df -g
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4 0.25 0.05 79% 10134 44% /
/dev/hd2 2.06 0.13 94% 44051 57% /usr
/dev/hd9var 0.44 0.15 67% 6196 15% /var
/dev/hd3 10.00 2.08 80% 4367 1% /tmp
/dev/hd1 0.06 0.00 100% 73 46% /home
/dev/hd11admin 0.12 0.12 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 0.38 0.18 51% 7044 14% /opt
/dev/livedump 0.25 0.25 1% 4 1% /var/adm/ras/livedump
/dev/fslv00 30.00 0.00 100% 54756 90% /u01
怀疑是数据库一直报警导致日志增大将空间占满了,进入oracle数据库告警日志
$ pwd
/u01/base/diag/rdbms/test/test1/trace
$ du -sg /u01/base/diag/rdbms/test/test1/trace
-
/u01/base/diag/rdbms/test/test1/trace
删除所有告警日志,因为是测试库,所以不去查到底是什么原因导致数据库一直报警。节点2服务器磁盘空间没有占满。
重新使用root用户启动crs,提示crs已经启动,但是使用crs_stat没有查到进程,原因回来再查询吧
# id
uid=0(root) gid=0(system) groups=2(bin),3(sys),7(security),8(cron),10(audit),11(lp)
# pwd
/u01/grid/bin
# ./crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
2014-05-28日更新
节点2没有crs进程,原因没有查到,直接将2台服务器重新启动,反正是测试机,可以随意重启,重启后2台服务器的所有crs进程全部启动了。
$ su - grid
grid's Password:
$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE nodea
ora.DG01.dg ora....up.type ONLINE ONLINE nodea
ora....ER.lsnr ora....er.type ONLINE ONLINE nodea
ora....N1.lsnr ora....er.type ONLINE ONLINE nodeb
ora.asm ora.asm.type ONLINE ONLINE nodea
ora.cvu ora.cvu.type ONLINE ONLINE nodeb
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE nodea
ora....SM1.asm application ONLINE ONLINE nodea
ora....EA.lsnr application ONLINE ONLINE nodea
ora.nodea.gsd application OFFLINE OFFLINE
ora.nodea.ons application ONLINE ONLINE nodea
ora.nodea.vip ora....t1.type ONLINE ONLINE nodea
ora....SM2.asm application ONLINE ONLINE nodeb
ora....EB.lsnr application ONLINE ONLINE nodeb
ora.nodeb.gsd application OFFLINE OFFLINE
ora.nodeb.ons application ONLINE ONLINE nodeb
ora.nodeb.vip ora....t1.type ONLINE ONLINE nodeb
ora.oc4j ora.oc4j.type ONLINE ONLINE nodeb
ora.ons ora.ons.type ONLINE ONLINE nodea
ora....ry.acfs ora....fs.type ONLINE ONLINE nodea
ora.scan1.vip ora....ip.type ONLINE ONLINE nodeb
ora.test.db ora....se.type ONLINE ONLINE nodea
$