联系:手机/微信(+86 13429648788) QQ(107644445)
标题:ora-600 kfdpMetaBlk_pickle 故障处理
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户反馈集群的crs无法正常启动观察发现是由于gmon进程crash asm实例导致,经过测试确认是在mount data磁盘组的时候会触发给问题
SQL> alter diskgroup data mount;alter diskgroup data mount*ERROR at line 1:ORA-03113: end-of-file on communication channelProcess ID: 7517Session ID: 918 Serial number: 5 |
对应的alert日志报ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误
SQL> alter diskgroup data mountNOTE: cache registered group DATA number=2 incarn=0x3078f05fNOTE: cache began mount (first) of group DATA number=2 incarn=0x3078f05fNOTE: Assigning number (2,1) to disk (/dev/rdisk/disk93)NOTE: Assigning number (2,3) to disk (/dev/rdisk/disk96)NOTE: Assigning number (2,2) to disk (/dev/rdisk/disk94)NOTE: Assigning number (2,0) to disk (/dev/rdisk/disk92)Sat Jul 17 05:21:01 2021Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc (incident=255833):ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], []Incident details in: /u01/app/crs_base/diag/asm/+asm/+ASM2/incident/incdir_255833/+ASM2_gmon_7457_i255833.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Errors in file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_gmon_7457.trc:ORA-00600: internal error code, arguments: [kfdpMetaBlk_pickle:01], [4294967295], [0], [], [], [], [], [], [], [], [], []GMON (ospid: 7457): terminating the instance due to error 493Sat Jul 17 05:21:03 2021System state dump requested by (instance=2, osid=7457 (GMON)), summary=[abnormal instance termination].System State dumped to trace file /u01/app/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_diag_7429.trcInstance terminated by GMON, pid = 7457 |
对于ORA-600 [kfdpMetaBlk_pickle:01], [4294967295]错误,查询了mos没有任何有效信息
对应的trace文件发现如下信息
2021-07-17 03:51:16.277603*:800002A2:KGF:kgfdputl.c@1411:kgfdpMetaSet_getMaxClique(): inc=2 ver=42949672952021-07-17 03:51:16.277619 :800002A3:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 22021-07-17 03:51:16.277620 :800002A4:KFDP:kfdp.c@9314:kfdpMetaSet_filterOld(): filtered old meta on disk 22021-07-17 03:51:16.277992 :800002A5:KFDP:kfdp.c@9417:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle upto 6 metablks2021-07-17 03:51:16.277993 :800002A6:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 32021-07-17 03:51:16.278154 :800002A7:KFDP:kfdp.c@9425:kfdpMetaSet_readDta():kfdpMetaSet_readDta unpickle metablk for disk 12021-07-17 03:51:16.278268 :800002A8:KFDP:kfdp.c@5851:kfdp_read(): kfdp_read end ok=12021-07-17 03:51:16.278277 :800002A9:KFDP:kfdp.c@7073:kfdp_doQuery(): kfdp_doQuery rewrite_kfdp=12021-07-17 03:51:16.278282 :800002AA:KFDP:kfdp.c@12511:kfdpLckValue_pickle(): kfdpLckValue_pickle size=0 endian=0xff ndisks=0 lckvalid=02021-07-17 03:51:16.278293 :800002AB:db_trace:kfdp.c@12803:kfdpLck_convPriv(): [10499:19:396] kfdpLck_conv: grp=1, type=0, mode=5, line=71552021-07-17 03:51:16.278294 :800002AC:KFDP:kfdp.c@12663:kfdpLckValue_unpickle(): kfdpLckValue_unpickle size=28 res=0 ok=0 ver=-1 dcnt=0 lckvalid=0 flags=0x2 inst=0 (I am 2) version=02021-07-17 03:51:16.278499*:800002AD:KGF:kgfdputl.c@485:kgfdpDta_getAllDsks(): kgfdpDta_getAllDsks using saved iterator 0x9ffffffffd571220 with 4 disks2021-07-17 03:51:16.278688 :800002AE:KFDP:kfdp.c@5566:kfdp_write(): kfdp_write: pstDskCnt=3 grow=0 degenerate=02021-07-17 03:51:16.278688*:800002AF:KGF:kgfdputl.c@2619:kgfdpTraceSet(): writing pst to disks (n=3): 0 1 3 |
通过删除信息,基本上可以确认由于pst信息异常(pst中记录的只有0 1 3三个磁盘,认为2是老磁盘),但是实际磁盘为4个,导致gmon进程异常.通过底层解决该问题,数据库恢复成功
SQL> recover database using backup controlfile;ORA-00279: change 30075814973 generated at 07/17/2021 01:12:08 needed forthread 2ORA-00289: suggestion : +FRAORA-00280: change 30075814973 for thread 2 is in sequence #120561Specify log: {<RET>=suggested | filename | AUTO | CANCEL}/tmp/asm/group_16ORA-00279: change 30075814973 generated at 07/17/2021 01:11:54 needed forthread 1ORA-00289: suggestion :+FRA/xff/archivelog/2021_07_17/thread_1_seq_79949.1543.1078103529ORA-00280: change 30075814973 for thread 1 is in sequence #79949Specify log: {<RET>=suggested | filename | AUTO | CANCEL}/tmp/asm/group_13ORA-00279: change 30075815013 generated at 07/17/2021 01:12:09 needed forthread 1ORA-00289: suggestion : +FRAORA-00280: change 30075815013 for thread 1 is in sequence #79950ORA-00278: log file '/tmp/asm/group_13' no longer needed for this recoverySpecify log: {<RET>=suggested | filename | AUTO | CANCEL}/tmp/asm/group_11Log applied.Media recovery complete.SQL> alter database open resetlogs;Database altered. |
运气不错,对于该故障的恢复,实现数据0丢失.