zoukankan      html  css  js  c++  java
  • 了解你所不知道的SMON功能(五):Recover Dead transaction

    SMON的作用还包括清理死事务:Recover Dead transaction。当服务进程在提交事务(commit)前就意外终止的话会形成死事务(dead transaction),PMON进程负责轮询Oracle进程,找出这类意外终止的死进程(dead process),通知SMON将与该dead process相关的dead transaction回滚清理,并且PMON还负责恢复dead process原本持有的锁和latch。 我们来具体了解dead transaction的恢复过程:
    SQL> select * from v$version;
    
    BANNER
    ----------------------------------------------------------------
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
    PL/SQL Release 10.2.0.4.0 - Production
    CORE    10.2.0.4.0      Production
    TNS for Linux: Version 10.2.0.4.0 - Production
    NLSRTL Version 10.2.0.4.0 - Production
    
    SQL> select  * from global_name;
    
    GLOBAL_NAME
    --------------------------------------------------------------------------------
    www.oracledatabase12g.com
    
    SQL>alter system set fast_start_parallel_rollback=false;
    System altered.
    
    设置10500,10046事件以跟踪SMON进程的行为
    
    SQL> alter system set events '10500 trace name context forever,level 8';
    System altered.
    
    SQL> oradebug setospid 4424
    Oracle pid: 8, Unix process pid: 4424, image: oracle@rh2.oracle.com (SMON)
    
    SQL> oradebug event 10046 trace name context forever,level 8;
    Statement processed.
    
    在一个新的terminal中执行大批量的删除语句,在执行一段时间后使用操作系统命令将执行该删除操作的
    服务进程kill掉,模拟一个大的dead transaction的场景
    
    SQL> delete large_rb;
    delete large_rb
    
    [oracle@rh2 bdump]$ kill -9 4535
    
    等待几秒后pmon进程会找出dead process:
    [claim lock for dead process][lp 0x7000003c70ceff0][p 0x7000003ca63dad8.1290666][hist x9a514951]
    
    在x$ktube内部视图中出现ktuxecfl(Transaction flags)标记为DEAD的记录:
    
    SQL> select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';
    
    SUM(DISTINCT(KTUXESIZ))
    -----------------------
                      29386
    
    SQL> /
    
    SUM(DISTINCT(KTUXESIZ))
    -----------------------
                      28816
    
    以上KTUXESIZ代表事务所使用的undo块总数(number of undo blocks used by the transaction)
    
    ==================smon trace content==================
    SMON: system monitor process posted
    WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332801935
    WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332815568
    WAIT #0: nam='latch: row cache objects' ela= 95 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333332734
    WAIT #0: nam='latch: row cache objects' ela= 83 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333356173
    WAIT #0: nam='latch: undo global data' ela= 104 address=3066991984 number=187 tries=1 obj#=1 tim=1278243347987705
    WAIT #0: nam='latch: object queue header operation' ela= 89 address=3094817048 number=131 tries=0 obj#=1 tim=1278243362468042
    WAIT #0: nam='log file switch (checkpoint incomplete)' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243419588202
    Dead transaction 0x00c2.008.0000006d recovered by SMON
    
    =====================
    PARSING IN CURSOR #3 len=358 dep=1 uid=0 oct=3 lid=0 tim=1278243423594568 hv=3186851936 ad='ae82c1b8'
    select smontabv.cnt,
           smontab.time_mp,
           smontab.scn,
           smontab.num_mappings,
           smontab.tim_scn_map,
           smontab.orig_thread
      from smon_scn_time smontab,
           (select max(scn) scnmax,
                   count(*) + sum(NVL2(TIM_SCN_MAP, NUM_MAPPINGS, 0)) cnt
              from smon_scn_time
             where thread = 0) smontabv
     where smontab.scn = smontabv.scnmax
       and thread = 0
    
    END OF STMT
    PARSE #3:c=0,e=1354526,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1278243423594556
    EXEC #3:c=0,e=106,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1278243423603269
    FETCH #3:c=0,e=47065,p=0,cr=319,cu=0,mis=0,r=1,dep=1,og=4,tim=1278243423650375
    *** 2011-06-24 21:19:25.899
    WAIT #0: nam='smon timer' ela= 299999999 sleep time=300 failed=0 p3=0 obj#=1 tim=1278243716699171
    kglScanDependencyHandles4Unpin():
      cumscan=3 cumupin=4 time=776 upinned=0
    以上SMON回滚清理Dead transaction的过程从"system monitor process posted"开始到"Dead transaction 0x00c2.008.0000006d recovered by SMON"结束。另外可以看到在恢复过程中SMON先后请求了'latch: row cache objects'、'latch: undo global data'、'latch: object queue header operation'三种不同类型的latch。 现象 fast_start_parallel_rollback参数决定了SMON在回滚事务时使用的并行度,若将该参数设置为false那么并行回滚将被禁用,若设置为Low(默认值)那么会以2*CPU_COUNT数目的并行度回滚,当设置为High则4*CPU_COUNT数目的回滚进程将参与进来。当我们通过以下查询发现系统中存在大的dead tranacation需要回滚时我们可以通过设置fast_start_parallel_rollback为HIGH来加速恢复:
    select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';
    
    ==============parallel transaction recovery===============
    
    *** 2011-06-24 20:31:01.765
    SMON: system monitor process posted msgflag:0x0000 (-/-/-/-/-/-/-)
    
    *** 2011-06-24 20:31:01.765
    SMON: process sort segment requests begin
    
    *** 2011-06-24 20:31:01.765
    SMON: process sort segment requests end
    
    *** 2011-06-24 20:31:01.765
    SMON: parallel transaction recovery begin
    WAIT #0: nam='DFS lock handle' ela= 504 type|mode=1413545989 id1=3 id2=11 obj#=2 tim=1308918661765715
    WAIT #0: nam='DFS lock handle' ela= 346 type|mode=1413545989 id1=3 id2=12 obj#=2 tim=1308918661766135
    WAIT #0: nam='DFS lock handle' ela= 565 type|mode=1413545989 id1=3 id2=13 obj#=2 tim=1308918661766758
    WAIT #0: nam='DFS lock handle' ela= 409 type|mode=1413545989 id1=3 id2=14 obj#=2 tim=1308918661767221
    WAIT #0: nam='DFS lock handle' ela= 332 type|mode=1413545989 id1=3 id2=15 obj#=2 tim=1308918661767746
    WAIT #0: nam='DFS lock handle' ela= 316 type|mode=1413545989 id1=3 id2=16 obj#=2 tim=1308918661768146
    WAIT #0: nam='DFS lock handle' ela= 349 type|mode=1413545989 id1=3 id2=17 obj#=2 tim=1308918661768549
    WAIT #0: nam='DFS lock handle' ela= 258 type|mode=1413545989 id1=3 id2=18 obj#=2 tim=1308918661768858
    WAIT #0: nam='DFS lock handle' ela= 310 type|mode=1413545989 id1=3 id2=19 obj#=2 tim=1308918661769224
    WAIT #0: nam='DFS lock handle' ela= 281 type|mode=1413545989 id1=3 id2=20 obj#=2 tim=1308918661769555
    
    *** 2011-06-24 20:31:01.769
    SMON: parallel transaction recovery end
    但是在real world的实践中可以发现当fast_start_parallel_rollback= Low/High,即启用并行回滚时常有并行进程因为各种资源互相阻塞导致回滚工作停滞的例子,当遭遇到这种问题时将fast_start_parallel_rollback设置为FALSE一般可以保证恢复工作以串行形式在较长时间内完成。 如何禁止SMON Recover Dead transaction 可以设置10513事件来临时禁止SMON恢复死事务,这在我们做某些异常恢复的时候显得异常有效,当然不建议在一个正常的生产环境中设置这个事件:
    SQL> alter system set events '10513 trace name context forever, level 2';
    
    System altered.
    
    10531 -- event disables transaction recovery which was initiated by SMON
    
    SQL> select ktuxeusn,
      2         to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS') "Time",
      3         ktuxesiz,
      4         ktuxesta
      5    from x$ktuxe
      6   where ktuxecfl = 'DEAD';
    
      KTUXEUSN Time                         KTUXESIZ KTUXESTA
    ---------- -------------------------- ---------- ----------------
            17 24-JUN-2011 22:03:10                0 INACTIVE
            66 24-JUN-2011 22:03:10                0 INACTIVE
           105 24-JUN-2011 22:03:10                0 INACTIVE
           193 24-JUN-2011 22:03:10            33361 ACTIVE
           194 24-JUN-2011 22:03:10                0 INACTIVE
           194 24-JUN-2011 22:03:10                0 INACTIVE
           197 24-JUN-2011 22:03:10            20171 ACTIVE
    
    7 rows selected.
    
    SQL> /
    
      KTUXEUSN Time                         KTUXESIZ KTUXESTA
    ---------- -------------------------- ---------- ----------------
            17 24-JUN-2011 22:03:10                0 INACTIVE
            66 24-JUN-2011 22:03:10                0 INACTIVE
           105 24-JUN-2011 22:03:10                0 INACTIVE
           193 24-JUN-2011 22:03:10            33361 ACTIVE
           194 24-JUN-2011 22:03:10                0 INACTIVE
           194 24-JUN-2011 22:03:10                0 INACTIVE
           197 24-JUN-2011 22:03:10            20171 ACTIVE
    
    7 rows selected.
    
    ================smon disabled trans recover trace==================
    
    SMON: system monitor process posted
    *** 2011-06-24 22:02:57.980
    SMON: Event 10513 is level 2, trans recovery disabled.
    
  • 相关阅读:
    hdu 2822 Dogs (BFS+优先队列)
    hdu 2757 Ocean Currents(BFS+DFS)
    hdu2844 Coins(普通的多重背包 + 二进制优化)
    hdu1495 && pku3414
    hdu1054 Strategic Game(树形DP)
    FckEditor V2.6 fckconfig.js中文注释
    数字文本控件
    统计在线用户列表 for .net WebForm
    智能客户端
    模拟Confirm的Web自定义控件
  • 原文地址:https://www.cnblogs.com/macleanoracle/p/2967810.html
Copyright © 2011-2022 走看看