今天,OBIEE11G做维护,重启了下。确出现了些问题。ClusterControler 无法通信的问题。本人觉得这个问题很有借鉴意义,希望与大家分享一下。
1.重启后
执行{Oracle_bieehome}/user_projects/domains/bifoundation_domain/bin/startweblogic.cmd 启动weblogic应用服务器
执行{Oracle_bieehome}/user_projects/domains/bifoundation_domain/bin/startManageredWeblogic.cmd 启动weblogic守护进程。
执行{Oracle_bieehome}/instance/instance1/bin/opmnctl startproc ias-component=coreapplication_obiccs1
执行{Oracle_bieehome}/instance/instance1/bin/opmnctl startproc ias-component=coreapplication_obisch1
执行{Oracle_bieehome}/instance/instance1/bin/opmnctl startproc ias-component=coreapplication_obijh1
执行{Oracle_bieehome}/instance/instance1/bin/opmnctl startproc ias-component=coreapplication_obips1
执行{Oracle_bieehome}/instance/instance1/bin/opmnctl startproc ias-component=coreapplication_obis1
其中除了ClusterControler启动失败以外,其他BI BackGroud Program Process 均启动成功。
2.重启 ClusterControler 是否能解决。
为了验证ClusterControler的启动状态,
使用 {Oracle_bieehome}/instance/instance1/bin/opmnctl status
同时使用 http://biee_host:7001/em 进入coreapplication 中查看可行性状态,显示ClusterController状态为Failed
单独启动ClusterController进程 restart ,启动依旧失败
全部重启BIEE后台进程(五大后台进程,有依存关系) ,ClusterController依旧失败
3.定位 ClusterController进程日志 确定问题
opmn的日志文件位于 {Oracle_bieehome}/instance/instance1/diagnostics/logs/opmn/opmn中的
[2012-09-28T13:23:51+08:00] [opmn] [NOTIFICATION:1] [663] [pm-process] Stopping Process: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161224:2212) [2012-09-28T13:23:51+08:00] [opmn] [ERROR:1] [] [libopmncustom] Forcefully Terminating Process: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161224:2212) [2012-09-28T13:25:52+08:00] [opmn] [NOTIFICATION:1] [663] [pm-process] Stopping Process: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161224:2212) [2012-09-28T13:25:52+08:00] [opmn] [ERROR:1] [] [libopmncustom] Forcefully Terminating Process: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161224:2212) [2012-09-28T13:27:54+08:00] [opmn] [NOTIFICATION:1] [663] [pm-process] Stopping Process: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161224:2212) [2012-09-28T13:27:54+08:00] [opmn] [ERROR:1] [] [libopmncustom] Forcefully Terminating Process: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161224:2212) [2012-09-28T13:28:03+08:00] [opmn] [NOTIFICATION:1] [662] [pm-process] Starting Process: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161240:0) [2012-09-28T13:28:05+08:00] [opmn] [NOTIFICATION:1] [665] [pm-process] Process Alive: coreapplication_obiccs1~OracleBIClusterControllerComponent~BIClusterController~1 (1200161240:4408)
[2012-09-28T13:22:22+08:00] [opmn] [ERROR:1] [] [libopmncustom] Process Ping Failed:
特别是 Process Ping Failed ,这一条引人注目,开始怀疑 是进程端口被占用。
4.检查占用端口。
ClusterController占用的端口为9706
查看其端口进程
netstat -aon | findstr "9706"
TCP 10.11.1.48:1707 10.11.1.48:9706 SYN_SENT 4348
具体查看进程4348 pid在windows中 服务
tasklist | findstr "4348"
sawserver.exe 4348 Console 0
这个很正常 sawserver.exe 是opmn整个后台进程的封装。
杀掉sawserver.exe后
。查看opmnctl 的状态
---------------------------------+--------------------+---------+--------- ias-component | process-type | pid | status ---------------------------------+--------------------+---------+--------- coreapplication_obiccs1 | OracleBIClusterCo~ | 2212 | Stop coreapplication_obisch1 | OracleBIScheduler~ | 2808 | Alive coreapplication_obijh1 | OracleBIJavaHostC~ | 3728 | Alive coreapplication_obips1 | OracleBIPresentat~ | 764 | Alive coreapplication_obis1 | OracleBIServerCom~ | 3836 | Alive
确定coreapplication_obiccs1 的 pid 为2212 ,该pid 应该封装在sawserver.exe中,但是现在sawserver.exe 已经杀掉了. 如果有与其冲突的进程的话,一定可以找到。
D:\0BIEE11G\BIEE\instances\instance1\bin>tasklist|findstr "2212" wmiprvse.exe 2212 Console 0 3,900 K D:\0BIEE11G\BIEE\instances\instance1\bin>tasklist|findstr "2808" nqscheduler.exe 2808 Console 0 57,044 K D:\0BIEE11G\BIEE\instances\instance1\bin>tasklist|findstr "3728" java.exe 3728 Console 0 29,912 K D:\0BIEE11G\BIEE\instances\instance1\bin>tasklist|findstr "764" odbcad32.exe 5764 RDP-Tcp#4 1 2,196 K sawserver.exe 764 Console 0 134,356 K D:\0BIEE11G\BIEE\instances\instance1\bin>tasklist|findstr "3836" nqsserver.exe 3836 Console 0 66,080 K
果然符合我的猜测, pid 2212被 wmiprvse.exe 进程占用了, baidu了一下 该进程,不是OBIEE的系统进程,也不是WindowsOS的核心进程,所以杀掉.
杀掉后再查看
D:\0BIEE11G\BIEE\instances\instance1\bin>tasklist|findstr "2212"
没有返回结果。
再次启动OPMN
D:\0BIEE11G\BIEE\instances\instance1\bin>opmnctl startproc ias-component=coreapp lication_obiccs1 opmnctl startproc: starting opmn managed processes... ================================================================================ opmn id=bi-4lna2lrlna7w:9501 0 of 0 processes started. Processes are already started: instance1~coreapplication_obiccs1~OracleBIClust erControllerComponent~BIClusterController D:\0BIEE11G\BIEE\instances\instance1\bin>opmnctl status Processes in Instance: instance1 ---------------------------------+--------------------+---------+--------- ias-component | process-type | pid | status ---------------------------------+--------------------+---------+--------- coreapplication_obiccs1 | OracleBIClusterCo~ | 4408 | Alive coreapplication_obisch1 | OracleBIScheduler~ | 2808 | Alive coreapplication_obijh1 | OracleBIJavaHostC~ | 3728 | Alive coreapplication_obips1 | OracleBIPresentat~ | 764 | Alive coreapplication_obis1 | OracleBIServerCom~ | 3836 | Alive
谢天谢地! 一切正常。