zoukankan      html  css  js  c++  java
  • 如何利用RMAN Debug和10046 Trace来诊断RMAN问题?

     

         在做Support的这些年,我很大的收获是掌握了许多troubleshooting问题的方法和工具,对于每一类问题,都可以大体归类出一些诊断方法。无论问题多么复杂,像扒洋葱一样,一层层去掉无关的,留下关键的,同时借助于一些诊断工具,层层深入,最后找到问题的核心。


         在这篇文章中,我想介绍一下如何对RMAN的问题做debug。 我们借助于下面这个场景,说明如何Debug RMAN 问题

     

    在11.2.0.4上,物理备库上执行归档备份时,出现了下面的错误:

    [oracle@test1 ~]$ rman target /

    Recovery Manager: Release 11.2.0.4.0 - Production on Tue Mar 25 13:58:59 2014

    Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

    connected to target database: R11204 (DBID=2001766638, not open)

    RMAN> backup archivelog all;

    Starting backup at 25-MAR-14
    using target database control file instead of recovery catalog
    RMAN-06820: WARNING: failed to archive current log at primary database<==============报错无法连接到主库
    Connect identifier for DB_UNIQUE_NAME R11204 not configured
    allocated channel: ORA_DISK_1
    channel ORA_DISK_1: SID=1 device type=DISK
    channel ORA_DISK_1: starting archived log backup set
    channel ORA_DISK_1: specifying archived log(s) in backup set
    input archived log thread=1 sequence=58 RECID=1 STAMP=842172838  <==============后面的归档备份还是成功的
    input archived log thread=1 sequence=59 RECID=2 STAMP=842172838
    input archived log thread=1 sequence=60 RECID=3 STAMP=842172928
    input archived log thread=1 sequence=61 RECID=4 STAMP=842173275
    input archived log thread=1 sequence=62 RECID=5 STAMP=842175748
    input archived log thread=1 sequence=63 RECID=6 STAMP=842182033
    input archived log thread=1 sequence=64 RECID=7 STAMP=842185463
    input archived log thread=1 sequence=65 RECID=9 STAMP=842432288
    input archived log thread=1 sequence=66 RECID=8 STAMP=842432286
    input archived log thread=1 sequence=67 RECID=10 STAMP=842432291
    input archived log thread=1 sequence=68 RECID=11 STAMP=842432322
    channel ORA_DISK_1: starting piece 1 at 25-MAR-14
    channel ORA_DISK_1: finished piece 1 at 25-MAR-14
    piece handle=/u01/app/oracle/fast_recovery_area/SDY/backupset/2014_03_25/o1_mf_annnn_TAG20140325T135903_9m2hlj0p_.bkp tag=TAG20140325T135903 comment=NONE
    channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
    Finished backup at 25-MAR-14

    注:11.2.0.4有一个新的特性,就是备库上备份归档时,它会连接到主库上,让主库进行一次log switch,所以上面的错误是备库无法连接到主库进行log switch,上面最主要的错误是“Connect identifier for DB_UNIQUE_NAME R11204 not configured ”,看起来是备库要连接主库时需要用到的连接串没有配置正确。这时我们的疑问是这个连接串是在哪里设置的?


    首先我们用一下Rman Debug:

    [oracle@test1 ~]$ rman target / debug trace=/tmp/rman_debug
    RMAN>  backup archivelog all;

    RMAN-03090: Starting backup at 25-MAR-14
    RMAN-06009: using target database control file instead of recovery catalog
    RMAN-06820: WARNING: failed to archive current log at primary database
    RMAN-06613: Connect identifier for DB_UNIQUE_NAME R11204 not configured
    ...

    针对生成的/tmp/rman_debug,我们发现连接串用了为空“lprimary_db_cs = NULL”

    DBGSQL:       TARGET> begin   :lprimary_db_cs :=     sys.dbms_backup_restore.get_connect_identifier       (dbuname=> :primary_dbunam
    e); end; 
    DBGSQL:          sqlcode = 0
    DBGSQL:           B :lprimary_db_cs = NULL《==========这是主库的连接串
    DBGSQL:           B :primary_dbuname = R11204
      DBGRCVMAN: getConfig: configurations exists for this site
    RMAN-06820: WARNING: failed to archive current log at primary database
    DBGMISC:      ENTERED krmkursr [14:08:50.007]

    也就是RMAN调用了一个内部的包 sys.dbms_backup_restore.get_connect_identifier来获得在备库连接主库时需要用到的串。这时我们需要知道这个串是在哪里设置的,为何为空。

    接下来,针对RMAN进行10046 trace:

    [oracle@test1 ~]$ rman target / debug trace=/tmp/rman_debug

    Recovery Manager: Release 11.2.0.4.0 - Production on Mon Mar 17 09:00:00 2014

    Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

    RMAN-06568: connected to target database: R11204 (DBID=2001766638, not open)

    RMAN> sql "alter session set tracefile_identifier=''rman_10046''";

    RMAN-06009: using target database control file instead of recovery catalog
    RMAN-06162: sql statement: alter session set tracefile_identifier=''rman_10046''

    RMAN> sql "alter session set events ''10046 trace name context forever,level 12''";

    RMAN-06162: sql statement: alter session set events ''10046 trace name context forever,level 12''

    RMAN> backup archivelog all;
    RMAN-03090: Starting backup at 25-MAR-14
    RMAN-06820: WARNING: failed to archive current log at primary database
    RMAN-06613: Connect identifier for DB_UNIQUE_NAME R11204 not configured
    ...

    查看生成的trace file,这个文件在udump下:
    $cd /u01/app/diag/rdbms/sdy/SDY/trace
    $ls -ltr
    -rw-r----- 1 oracle oinstall 1037463 Mar 25 14:11 SDY_ora_3792_rman_10046.trc

    PARSING IN CURSOR #140366085001120 len=119 dep=0 uid=0 oct=47 lid=0 tim=1395736859520777 hv=3388798669 ad='7ec65738' sqlid='7pwt2c34
    ztxqd'
    begin   :lprimary_db_cs :=     sys.dbms_backup_restore.get_connect_identifier       (dbuname=> :primary_dbuname); end; 
    END OF STMT
    PARSE #140366085001120:c=0,e=285,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1395736859520776
    BINDS #140366085001120:
     Bind#0
      oacdty=01 mxl=2000(1536) mxlc=00 mal=00 scl=00 pre=00
      oacflg=01 fl2=1000000 frm=01 csi=873 siz=2128 off=0
      kxsbbbfp=7fa986a27f08  bln=2000  avl=00  flg=05
     Bind#1
      oacdty=01 mxl=128(90) mxlc=00 mal=00 scl=00 pre=00
      oacflg=01 fl2=1000000 frm=01 csi=873 siz=0 off=2000
      kxsbbbfp=7fa986a286d8  bln=128  avl=06  flg=01
      value="R11204"
    *** ACTION NAME:(0000018 STARTED189) 2014-03-25 14:10:59.521

    WAIT #140366085001120: nam='control file sequential read' ela= 10 file#=0 block#=1 blocks=1 obj#=-1 tim=1395736859521532
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=16 blocks=1 obj#=-1 tim=1395736859521566
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=18 blocks=1 obj#=-1 tim=1395736859521580
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=281 blocks=1 obj#=-1 tim=1395736859521594
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=1 blocks=1 obj#=-1 tim=1395736859521614
    WAIT #140366085001120: nam='control file sequential read' ela= 3 file#=0 block#=16 blocks=1 obj#=-1 tim=1395736859521627
    WAIT #140366085001120: nam='control file sequential read' ela= 2 file#=0 block#=18 blocks=1 obj#=-1 tim=1395736859521638
    WAIT #140366085001120: nam='control file sequential read' ela= 3 file#=0 block#=281 blocks=1 obj#=-1 tim=1395736859521650
    krsd_get_primary_connect_string: found pcs '' by FAL_SERVER lookup <====================用FAL_SERVER找到了连接串''

    所以这个10046 trace,很清楚地告诉我们它是从参数FAL_SERVER上获得了连接串''。

    这时,连接到备库,查看参数FAL_SERVER,它的值的确为空:
    SQL> show parameter fal

    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    fal_client                           string
    fal_server                           string  

    到此,我们通过RMAN debug和10046 trace,获得了我们想要的信息。

    总结一下:

    如果在执行RMAN命令后,遇到了性能问题或者需要深入跟踪一个错误,那么可以考虑使用rman debug:

    $ rman target <connection> catalog <connection> debug trace=/tmp/rmanDebug.trc log=/tmp/rmanLog.txt
    run { 
    ...Run your backup commands here 
    }

    如果还需要跟进一步的跟踪可以再使用10046 trace:

    $ rman target <connection> catalog <connection> debug trace=/tmp/rmanDebug.trc log=/tmp/rmanLog.txt
    RMAN> sql "alter session set tracefile_identifier=''rman_10046''";
    RMAN> sql "alter session set events ''10046 trace name context forever,level 12''";
    RMAN> run-your-commands;
    RMAN> exit;

    需要注意的是,上面的这些方法可能会生成大量文件,需要考虑对磁盘空间的压力以及对RMAN的性能的影响。


    可以参考MOS文档:RMAN: Quick Debugging Guide (Doc ID 1198753.1)

  • 相关阅读:
    HDU-2896 病毒侵袭 字符串问题 AC自动机
    HDU-2222 Keywords Search 字符串问题 AC自动机
    pyhton3 logging模块
    pyhton3 sys模块
    pyhton3 hashlib模块
    pyhton3 os模块
    pyhton3 time模块
    pyhton3 random模块
    变种XSS:持久控制
    富文本存储型XSS的模糊测试之道
  • 原文地址:https://www.cnblogs.com/DataArt/p/10018940.html
Copyright © 2011-2022 走看看