zoukankan      html  css  js  c++  java
  • 如何利用RMAN Debug和10046 Trace来诊断RMAN问题?

     

         在做Support的这些年,我很大的收获是掌握了许多troubleshooting问题的方法和工具,对于每一类问题,都可以大体归类出一些诊断方法。无论问题多么复杂,像扒洋葱一样,一层层去掉无关的,留下关键的,同时借助于一些诊断工具,层层深入,最后找到问题的核心。


         在这篇文章中,我想介绍一下如何对RMAN的问题做debug。 我们借助于下面这个场景,说明如何Debug RMAN 问题

     

    在11.2.0.4上,物理备库上执行归档备份时,出现了下面的错误:

    [oracle@test1 ~]$ rman target /

    Recovery Manager: Release 11.2.0.4.0 - Production on Tue Mar 25 13:58:59 2014

    Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

    connected to target database: R11204 (DBID=2001766638, not open)

    RMAN> backup archivelog all;

    Starting backup at 25-MAR-14
    using target database control file instead of recovery catalog
    RMAN-06820: WARNING: failed to archive current log at primary database<==============报错无法连接到主库
    Connect identifier for DB_UNIQUE_NAME R11204 not configured
    allocated channel: ORA_DISK_1
    channel ORA_DISK_1: SID=1 device type=DISK
    channel ORA_DISK_1: starting archived log backup set
    channel ORA_DISK_1: specifying archived log(s) in backup set
    input archived log thread=1 sequence=58 RECID=1 STAMP=842172838  <==============后面的归档备份还是成功的
    input archived log thread=1 sequence=59 RECID=2 STAMP=842172838
    input archived log thread=1 sequence=60 RECID=3 STAMP=842172928
    input archived log thread=1 sequence=61 RECID=4 STAMP=842173275
    input archived log thread=1 sequence=62 RECID=5 STAMP=842175748
    input archived log thread=1 sequence=63 RECID=6 STAMP=842182033
    input archived log thread=1 sequence=64 RECID=7 STAMP=842185463
    input archived log thread=1 sequence=65 RECID=9 STAMP=842432288
    input archived log thread=1 sequence=66 RECID=8 STAMP=842432286
    input archived log thread=1 sequence=67 RECID=10 STAMP=842432291
    input archived log thread=1 sequence=68 RECID=11 STAMP=842432322
    channel ORA_DISK_1: starting piece 1 at 25-MAR-14
    channel ORA_DISK_1: finished piece 1 at 25-MAR-14
    piece handle=/u01/app/oracle/fast_recovery_area/SDY/backupset/2014_03_25/o1_mf_annnn_TAG20140325T135903_9m2hlj0p_.bkp tag=TAG20140325T135903 comment=NONE
    channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
    Finished backup at 25-MAR-14

    注:11.2.0.4有一个新的特性,就是备库上备份归档时,它会连接到主库上,让主库进行一次log switch,所以上面的错误是备库无法连接到主库进行log switch,上面最主要的错误是“Connect identifier for DB_UNIQUE_NAME R11204 not configured ”,看起来是备库要连接主库时需要用到的连接串没有配置正确。这时我们的疑问是这个连接串是在哪里设置的?


    首先我们用一下Rman Debug:

    [oracle@test1 ~]$ rman target / debug trace=/tmp/rman_debug
    RMAN>  backup archivelog all;

    RMAN-03090: Starting backup at 25-MAR-14
    RMAN-06009: using target database control file instead of recovery catalog
    RMAN-06820: WARNING: failed to archive current log at primary database
    RMAN-06613: Connect identifier for DB_UNIQUE_NAME R11204 not configured
    ...

    针对生成的/tmp/rman_debug,我们发现连接串用了为空“lprimary_db_cs = NULL”

    DBGSQL:       TARGET> begin   :lprimary_db_cs :=     sys.dbms_backup_restore.get_connect_identifier       (dbuname=> :primary_dbunam
    e); end; 
    DBGSQL:          sqlcode = 0
    DBGSQL:           B :lprimary_db_cs = NULL《==========这是主库的连接串
    DBGSQL:           B :primary_dbuname = R11204
      DBGRCVMAN: getConfig: configurations exists for this site
    RMAN-06820: WARNING: failed to archive current log at primary database
    DBGMISC:      ENTERED krmkursr [14:08:50.007]

    也就是RMAN调用了一个内部的包 sys.dbms_backup_restore.get_connect_identifier来获得在备库连接主库时需要用到的串。这时我们需要知道这个串是在哪里设置的,为何为空。

    接下来,针对RMAN进行10046 trace:

    [oracle@test1 ~]$ rman target / debug trace=/tmp/rman_debug

    Recovery Manager: Release 11.2.0.4.0 - Production on Mon Mar 17 09:00:00 2014

    Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

    RMAN-06568: connected to target database: R11204 (DBID=2001766638, not open)

    RMAN> sql "alter session set tracefile_identifier=''rman_10046''";

    RMAN-06009: using target database control file instead of recovery catalog
    RMAN-06162: sql statement: alter session set tracefile_identifier=''rman_10046''

    RMAN> sql "alter session set events ''10046 trace name context forever,level 12''";

    RMAN-06162: sql statement: alter session set events ''10046 trace name context forever,level 12''

    RMAN> backup archivelog all;
    RMAN-03090: Starting backup at 25-MAR-14
    RMAN-06820: WARNING: failed to archive current log at primary database
    RMAN-06613: Connect identifier for DB_UNIQUE_NAME R11204 not configured
    ...

    查看生成的trace file,这个文件在udump下:
    $cd /u01/app/diag/rdbms/sdy/SDY/trace
    $ls -ltr
    -rw-r----- 1 oracle oinstall 1037463 Mar 25 14:11 SDY_ora_3792_rman_10046.trc

    PARSING IN CURSOR #140366085001120 len=119 dep=0 uid=0 oct=47 lid=0 tim=1395736859520777 hv=3388798669 ad='7ec65738' sqlid='7pwt2c34
    ztxqd'
    begin   :lprimary_db_cs :=     sys.dbms_backup_restore.get_connect_identifier       (dbuname=> :primary_dbuname); end; 
    END OF STMT
    PARSE #140366085001120:c=0,e=285,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1395736859520776
    BINDS #140366085001120:
     Bind#0
      oacdty=01 mxl=2000(1536) mxlc=00 mal=00 scl=00 pre=00
      oacflg=01 fl2=1000000 frm=01 csi=873 siz=2128 off=0
      kxsbbbfp=7fa986a27f08  bln=2000  avl=00  flg=05
     Bind#1
      oacdty=01 mxl=128(90) mxlc=00 mal=00 scl=00 pre=00
      oacflg=01 fl2=1000000 frm=01 csi=873 siz=0 off=2000
      kxsbbbfp=7fa986a286d8  bln=128  avl=06  flg=01
      value="R11204"
    *** ACTION NAME:(0000018 STARTED189) 2014-03-25 14:10:59.521

    WAIT #140366085001120: nam='control file sequential read' ela= 10 file#=0 block#=1 blocks=1 obj#=-1 tim=1395736859521532
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=16 blocks=1 obj#=-1 tim=1395736859521566
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=18 blocks=1 obj#=-1 tim=1395736859521580
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=281 blocks=1 obj#=-1 tim=1395736859521594
    WAIT #140366085001120: nam='control file sequential read' ela= 4 file#=0 block#=1 blocks=1 obj#=-1 tim=1395736859521614
    WAIT #140366085001120: nam='control file sequential read' ela= 3 file#=0 block#=16 blocks=1 obj#=-1 tim=1395736859521627
    WAIT #140366085001120: nam='control file sequential read' ela= 2 file#=0 block#=18 blocks=1 obj#=-1 tim=1395736859521638
    WAIT #140366085001120: nam='control file sequential read' ela= 3 file#=0 block#=281 blocks=1 obj#=-1 tim=1395736859521650
    krsd_get_primary_connect_string: found pcs '' by FAL_SERVER lookup <====================用FAL_SERVER找到了连接串''

    所以这个10046 trace,很清楚地告诉我们它是从参数FAL_SERVER上获得了连接串''。

    这时,连接到备库,查看参数FAL_SERVER,它的值的确为空:
    SQL> show parameter fal

    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    fal_client                           string
    fal_server                           string  

    到此,我们通过RMAN debug和10046 trace,获得了我们想要的信息。

    总结一下:

    如果在执行RMAN命令后,遇到了性能问题或者需要深入跟踪一个错误,那么可以考虑使用rman debug:

    $ rman target <connection> catalog <connection> debug trace=/tmp/rmanDebug.trc log=/tmp/rmanLog.txt
    run { 
    ...Run your backup commands here 
    }

    如果还需要跟进一步的跟踪可以再使用10046 trace:

    $ rman target <connection> catalog <connection> debug trace=/tmp/rmanDebug.trc log=/tmp/rmanLog.txt
    RMAN> sql "alter session set tracefile_identifier=''rman_10046''";
    RMAN> sql "alter session set events ''10046 trace name context forever,level 12''";
    RMAN> run-your-commands;
    RMAN> exit;

    需要注意的是,上面的这些方法可能会生成大量文件,需要考虑对磁盘空间的压力以及对RMAN的性能的影响。


    可以参考MOS文档:RMAN: Quick Debugging Guide (Doc ID 1198753.1)

  • 相关阅读:
    HDU5418.Victor and World(状压DP)
    POJ2686 Traveling by Stagecoach(状压DP)
    POJ3254Corn Fields(状压DP)
    HDU5407.CRB and Candies(数论)
    CodeForces 352D. Jeff and Furik
    CodeForces 352C. Jeff and Rounding(贪心)
    LightOj 1282 Leading and Trailing
    Ural 1057. Amount of Degrees(数位DP)
    HDU 2089 不要62 (数位DP)
    HDU5366 The mook jong (DP)
  • 原文地址:https://www.cnblogs.com/DataArt/p/10018940.html
Copyright © 2011-2022 走看看