今天搭建一套OGG环境,源端配置好Exatract进程后,启动失败,无任何报错,但进程的状态为STOPPED。整个处理过程简单记录下来。
1、启动Exatract进程
GGSCI (qdrac1new) 19> start e_0a
Sending START request to MANAGER ... EXTRACT E_0A starting GGSCI (qdrac1new) 20> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING EXTRACT STOPPED E_0A 01:06:21 00:1:26
GGSCI (qdrac1new) 21> |
2、查看ggserr.log日志文件
2021-11-01 16:46:53 WARNING OGG-01423 Oracle GoldenGate Capture for Oracle, e_0a.prm: No valid default archive log destination directory found for thread 3. 2021-11-01 16:46:54 INFO OGG-00546 Oracle GoldenGate Capture for Oracle, e_0a.prm: Default thread stack size: 196608. 2021-11-01 16:46:54 INFO OGG-00547 Oracle GoldenGate Capture for Oracle, e_0a.prm: Increasing thread stack size from 196608 to 1048576. |
可以看出日志文件中无任何报错,同时检查了进程的report文件,同样没有任何报错信息。但ggserr.log日志文件中的那条WARNING引起了我的注意,提示thread 3没有有效的默认归档路径。
3、在配置OGG之前,客户反馈源库是一套3节点的RAC,但是第3个节点坏了,目前只有两个节点跑业务。所以在配置OGG的Exatract进程时,指定了threads 2。
4、登录数据库,检查redo相关的信息。
SQL> select group#, thread#, status, first_time from v$log order by 2,1 ;
GROUP# THREAD# STATUS FIRST_TIME ---------- ---------- ---------------- ----------- 1 1 ACTIVE 2021/11/1 1 2 1 ACTIVE 2021/11/1 1 3 1 ACTIVE 2021/11/1 1 4 1 ACTIVE 2021/11/1 1 5 1 ACTIVE 2021/11/1 1 11 1 ACTIVE 2021/11/1 1 12 1 ACTIVE 2021/11/1 1 13 1 CURRENT 2021/11/1 1 14 1 ACTIVE 2021/11/1 1 15 1 ACTIVE 2021/11/1 1 6 2 INACTIVE 2021/11/1 1 7 2 INACTIVE 2021/11/1 1 8 2 ACTIVE 2021/11/1 1 9 2 INACTIVE 2021/11/1 1 10 2 ACTIVE 2021/11/1 1 16 2 ACTIVE 2021/11/1 1 17 2 ACTIVE 2021/11/1 1 18 2 CURRENT 2021/11/1 1 19 2 INACTIVE 2021/11/1 1 20 2 INACTIVE 2021/11/1 1 21 3 INACTIVE 2021/7/7 3: 22 3 INACTIVE 2021/7/7 3: 23 3 INACTIVE 2021/7/7 3: 24 3 INACTIVE 2021/7/7 3: 25 3 INACTIVE 2021/7/7 3: 26 3 INACTIVE 2021/7/7 3: 27 3 INACTIVE 2021/7/7 3: 28 3 INACTIVE 2021/7/7 3: 29 3 INACTIVE 2021/7/7 3: 30 3 INACTIVE 2021/7/7 3:
30 rows selected
SQL> select thread#, status, enabled from v$thread;
THREAD# STATUS ENABLED ---------- ------ -------- 1 OPEN PUBLIC 2 OPEN PUBLIC 3 CLOSED PRIVATE SQL> |
可见,第3个实例的redo虽然都是inactive状态,但仍然是enabled的。
5、告知客户,打算将thread 3关闭,并且删除thread 3中所有的redo,得到客户的允许后,执行如下操作。
SQL> SQL> ALTER DATABASE DISABLE THREAD 3;
Database altered.
SQL> select thread#, status, enabled from v$thread;
THREAD# STATUS ENABLED ---------- ------ -------- 1 OPEN PUBLIC 2 OPEN PUBLIC 3 CLOSED DISABLED
SQL> SQL > ALTER DATABASE DROP LOGFILE GROUP 22; ALTER DATABASE DROP LOGFILE GROUP 23; ALTER DATABASE DROP LOGFILE GROUP 24; ALTER DATABASE DROP LOGFILE GROUP 25; ALTER DATABASE DROP LOGFILE GROUP 26; ALTER DATABASE DROP LOGFILE GROUP 27; ALTER DATABASE DROP LOGFILE GROUP 28; ALTER DATABASE DROP LOGFILE GROUP 29; ALTER DATABASE DROP LOGFILE GROUP 30; |
6、重新启动Exatract进程,此时,进程启动成功。
GGSCI (qdrac1new) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING EXTRACT RUNNING E_0A 00:00:01 00:00:01 |