zoukankan      html  css  js  c++  java
  • Oracle systemstate dump介绍

        当数据库出现严重的性能问题或者hang起的时候,那么我们非常需要通过systemstate dump来知道进程在做什么,在等待什么,谁是资源的持有者,谁阻塞了别人。在出现上述问题时,及时收集systemstate dump非常有助于问题原因的分析。一般Oracle Support工程是也是需要你提供systemstate dump生成的trace文件做分析,关于systemstate dump的资料,其实没有非常详细的官方介绍资料,都是一些零零散散的介绍。

    当数据库出现严重性能问题或hang起的时候,服务器端sqlplus连接数据库要么非常慢,要么根本无法连接。ORACLE 10g 开始,sqlplus提供了这么一个功能参数-prelim,在sqlplus无法连接的情况下,连接登录到数据库。下面关于这些知识点的一个总结

     

    There are two ways to connect to sqlplus using a preliminary connection.

    sqlplus -prelim / as sysdba
     
    sqlplus /nolog
    set _prelim on
    connect / as sysdba

     

    用sysdba登录到数据库上:

    $sqlplus / as sysdba

    或者

    $sqlplus -prelim / as sysdba <==当数据库已经很慢或者hang到无法连接

     

    下面是在metalink上的介绍如何在单机或RAC环境下做Systemstate或Hanganalyze(详细信息,请见下面参考资料)

     

    Collection commands for Hanganalyze and Systemstate: Non-RAC:
    Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended,  where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".

    Hanganalyze
    sqlplus '/ as sysdba'
    oradebug setmypid
    oradebug unlimit
    oradebug hanganalyze 3
    -- Wait one minute before getting the second hanganalyze
    oradebug hanganalyze 3
    oradebug tracefile_name
    exit

    Systemstate
    sqlplus '/ as sysdba'
    oradebug setmypid
    oradebug unlimit
    oradebug dump systemstate 266
    oradebug dump systemstate 266
    oradebug tracefile_name
    exit

     

    Collection commands for Hanganalyze and Systemstate: RAC
    There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly unadvisable to use these level

    For information on these patches see:
    Document 11800959.8 Bug 11800959 - A SYSTEMSTATE dump with level >= 10 in RAC dumps huge BUSY GLOBAL CACHE ELEMENTS - can hang/crash instances
    Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance
     
    Note:  both bugs are fixed in 11.2.0.3.
     
    Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088
    For 11g:
    sqlplus '/ as sysdba'
    oradebug setorapname reco
    oradebug  unlimit
    oradebug -g all hanganalyze 3
    oradebug -g all hanganalyze 3
    oradebug -g all dump systemstate 266
    oradebug -g all dump systemstate 266
    exit
    Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088
    sqlplus '/ as sysdba'
    oradebug setorapname reco
    oradebug unlimit
    oradebug -g all hanganalyze 3
    oradebug -g all hanganalyze 3
    oradebug -g all dump systemstate 258
    oradebug -g all dump systemstate 258
    exit
    For 10g, run oradebug setmypid instead of oradebug setorapname reco:
    sqlplus '/ as sysdba'
    oradebug setmypid
    oradebug unlimit
    oradebug -g all hanganalyze 3
    oradebug -g all hanganalyze 3
    oradebug -g all dump systemstate 258
    oradebug -g all dump systemstate 258
    exit
    In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.

     

    那么我们现在来看一个例子吧:

    [oracle@DB-Server ~]$ sqlplus -prelim / as sysdba
     
    SQL*Plus: Release 10.2.0.5.0 - Production on Wed Mar 2 16:31:03 2016
     
    Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.
     
    SQL> oradebug setmypid
    Statement processed.
    SQL> oradebug unlimit
    Statement processed.
    SQL> oradebug dump systemstate 266
    Statement processed.
    SQL> oradebug dump systemstate 266
    Statement processed.
    SQL> oradebug tracefile_name
    /u01/app/oracle/admin/SCM2/udump/scm2_ora_13598.trc
    SQL> exit
    Disconnected from ORACLE

    clip_image001

    告警日志里面会看到类似这样的信息:

    Wed Mar 02 16:32:08 CST 2016

    System State dumped to trace file

    Wed Mar 02 16:32:48 CST 2016

    System State dumped to trace file /u01/app/oracle/admin/xxx/udump/scm2_ora_13598.trc

    $ORACLE_BASE/admin/ORACLE_SID/udump/ 下找到对应的trc文件,如下所示,你会看到大量系统中所有进程的进程状态等信息。每个进程对应跟踪文件中的一段内容,反映该进程的状态信息,包括进程信息,会话信息,enqueues信息(主要是lock的信息)等等。

    clip_image002

     

    systemstate dump有多个级别:

    2: dump (不包括lock element)

    10: dump

    11: dump + global cache of RAC

    256: short stack (函数堆栈)

    258: 256+2 -->short stack +dump(不包括lock element)

    266: 256+10 -->short stack+ dump

    267: 256+11 -->short stack+ dump + global cache of RAC

    level 11和 267会 dump global cache, 会生成较大的trace 文件,一般情况下不推荐。一般情况下,如果进程不是太多,推荐用266,因为这样可以dump出来进程的函数堆栈,可以用来分析进程在执行什么操作。但是生成short stack比较耗时,如果进程非常多,比如2000个进程,那么可能耗时30分钟以上。这种情况下,可以生成level 10 或者 level 258, level 258 比 level 10会多收集short short stack, 但比level 10少收集一些lock element data.

     

    使用systemstate dump生成的trace文件可能会非常大,一般都会几百兆甚至更大,虽然通过system state dump收集了进程的相关,但是如何有效的解读相关信息,并诊断问题是一个不小的难题和挑战!

     

    参考资料:

    https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=352993211736965&parent=DOCUMENT&sourceId=68738.1&id=452358.1&_afrWindowMode=0&_adf.ctrl-state=z7hwh19s9_319

    https://blogs.oracle.com/Database4CN/entry/systemstate_dump_%E4%BB%8B%E7%BB%8D

    http://tech.e2sn.com/oracle/troubleshooting/hang/how-to-log-on-even-when-sysdba-can-t-do-so

  • 相关阅读:
    路由器设置无线网的方法
    numpy, pandas, matplotlib等常用库的学习手册
    文本乱码的解决办法
    漫画:SOA中怎样确定服务的粒度?
    漫画:大公司都在重复造轮子吗?
    演讲稿:新人培养之道
    漫画:全面理解java.lang.IllegalArgumentException及其可用性设计
    漫画:性能、可用性和锁
    漫画:Linux中/etc/resolv.conf文件和puppet工具解析
    《两地书》--Kubernetes(K8s)基础知识(docker容器技术)
  • 原文地址:https://www.cnblogs.com/kerrycode/p/5236927.html
Copyright © 2011-2022 走看看